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Abstract 


Tools  for  automatic  program  analysis  promise  to  improve  programmer  productivity  by 
searching  and  summarizing  large  bodies  of  code.  However,  the  phenomenon  of  aliasing  — 
different  names  being  used  to  refer  to  the  same  data  —  reduces  the  effectiveness  of  simple 
textual  analyses.  This  dissertation  describes  the  design  of  a  system,  Ajax,  that  addresses  this 
problem  by  using  semantics-based  program  analysis  as  the  basis  for  a  number  of  different 
tools  to  aid  Java  programmers. 

To  enable  the  construction  of  many  tools,  Ajax  imposes  a  clean  separation  between  analysis 
engines  that  produce  alias  information  and  tools  that  consume  it.  Analyses  are  treated  as 
“black  boxes”  satisfying  a  simple,  formal  specification  given  in  terms  of  the  semantics  of 
Java  bytecode.  Knowing  only  this  specification,  one  can  build  many  different  tools  with 
only  a  small  amount  of  code.  The  thesis  explores  the  flexibility  and  efficiency  of  the  design 
by  describing  the  construction  and  evaluation  of  several  different  tools:  tools  to  find  dead 
code,  resolve  Java  virtual  method  calls,  statically  check  Java  downcasts,  search  for  accesses 
to  objects,  and  build  object  models. 

To  support  these  tools,  Ajax  includes  a  novel  static  analysis  engine  for  Java  called  SEMI, 
based  on  type  inference  with  polymorphic  recursion.  SEMI  provides  fully  context  sensitive 
analysis  of  large  programs.  Using  SEMI  with  the  downcast  checking  tool,  Ajax  can  prove 
the  safety  of  more  than  50%  of  the  downcast  instructions  in  some  real-life  Java  programs, 
such  as  Sun’s  bytecode  disassembler  and  the  JavaCC  parser  generator.  Ajax  is  the  first 
system  to  address  this  particular  task. 

One  of  the  key  goals  of  this  thesis  is  to  study  issues  bearing  on  the  practical  utility  of  static 
analysis  tools  for  programmers.  This  document  describes  some  of  the  challenges  involved 
in  building  an  analysis  system  for  off-the-shelf  Java  applications,  and  suggests  some 
possible  avenues  for  future  research. 
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1  Introduction 


1.1  Setting 

1.1.1  Software  Engineering  and  Alias  Analysis 

Building  large,  complex  software  systems  is  difficult.  Human  beings  have  limited  capacity 
to  understand  and  recall  the  details  of  such  systems.  Since  computers  are  adept  at  handling 
large  quantities  of  data,  one  would  expect  automatic  tools  to  be  useful  for  helping 
programmers  to  understand  large  programs. 

Indeed,  many  such  tools  do  exist.  Program  code  is  partitioned  into  files  and  organized  using 
file  systems.  Data  about  programs  are  stored  in  bug  databases  [88]  and  design  documents 
[70], 

In  my  thesis,  I  focus  on  tools  that  work  directly  with  program  code.  A  key  phenomenon  that 
makes  program  code  difficult  to  understand  is  aliasing:  the  use  of  multiple  names  to  refer 
to  the  same  entity.  For  example,  consider  the  fragment  of  Java  code  shown  in  Figure  1-1. 
In  this  code,  a  reference  to  the  string  object  “Hello”  is  stored  in  s  1  and  inserted  into  the 
Vector,  and  then  extracted  into  s.  Therefore  the  variables  s  and  s  1  are  aliased.  Likewise 
s  and  s  2  are  aliased. 


static  void  main ( )  { 

String  si  =  "Hello"; 

String  s2  =  "Kitty" ; 

Vector  v  =  new  Vector () ;  //  Create  a  new  Vector  containing 

v. addElement (si) ;  //  si  and  s2,  and  print  out  its 

v . addElement ( s2 ) ;  //  elements 

Integer  il  =  new  Integer (7); 

Vector  v2  =  new  Vector ( ) ; 
v2  . addElement (il)  ; 

for  (Enumeration  e  =  v . elements ()  ;  e . hasMoreElements ( )  ;  )  { 

String  s  =  (String) e . nextElement () ; 

System. out . println ( s . length ( )  )  ; 

} 

} 

Figure  1-1.  Example  of  Java  code  exhibiting  aliasing 

Suppose  the  programmer  wants  to  find  out  information  about  the  object  referred  to  by  si 
—  for  example,  what  methods  are  called  on  it,  and  where  in  the  program  those  calls  occur. 
It  is  insufficient  to  search  the  text  for  the  name  “s  1”.  The  programmer  must  also  examine 
si’s  aliases  —  in  this  case,  s.  In  general,  whenever  the  programmer  is  interested  in 
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properties  of  data  which  may  be  accessed  through  different  names,  alias  information  is 
required. 

Most  tools  for  understanding  code  make  no  attempt  to  handle  aliasing.  The  programmer 
must  manually  peruse  the  source  code  to  discover  aliasing  relationships  and  to  gather  infor¬ 
mation  about  the  referenced  data.  This  thesis  describes  the  design  of  a  practical  alias 
analysis  system  for  a  modem  programming  language  (Java),  and  code  understanding  tools 
based  on  it. 

1.1.2  The  Need  For  Alias  Information 

Many  different  questions  which  arise  during  programming  involve  alias  information. 
Consider  these  questions  that  a  programmer  might  ask:1 

1.  “What  kind  of  objects  can  be  in  the  container  X?” 

2.  “What  does  the  structure  of  object  X  and  its  contents  look  like?” 

3.  “Which  methods  of  object  X  are  invoked,  and  where  are  they  called?” 

4.  “Is  this  line  of  code  ever  executed  or  not?” 

The  programmer  might  specify  “object  X”  by  giving,  for  example,  a  program  location  and 
the  name  of  a  variable  in  scope  at  that  location. 

All  of  these  questions  require  alias  information.  Questions  1,  2  and  3  clearly  require  infor¬ 
mation  about  objects;  collecting  this  information  will  require  knowledge  of  which  names 
refer  to  the  objects  of  interest.  In  an  object-oriented  setting,  question  4  also  requires  alias 
information  because  tracing  the  flow  of  control  requires  information  about  objects  that  are 
targets  of  method  invocations. 

This  thesis  demonstrates  that  not  only  do  these  questions  require  alias  information,  but  once 
alias  information  is  available  in  a  convenient  format,  these  questions  are  relatively  easy  to 
answer. 

1.1.3  Shortcomings  of  Existing  Tools 

Existing  practical  tools  use  very  simple  approximations  whenever  they  need  alias  infor¬ 
mation.  A  common  and  useful  approximation  is  to  compare  the  declared  types  of  variables 
to  see  whether  they  may  be  aliases  [23].  For  example,  in  Figure  1-1,  the  Vector  v  and  the 
String  s  cannot  be  aliases  because  the  Java  class  hierarchy  does  not  permit  any  object  to 
be  simultaneously  a  String  and  a  Vector. 

However,  code  reuse  frequently  leads  to  different  instances  of  the  same  type  being  used  in 
different  ways.  For  example,  in  Figure  1-1  v  and  v2  are  Vectors,  a  generic  container 
type  frequently  used  in  Java.  Suppose  the  programmer  wishes  to  prove  that  the  Vector  in 
Figure  1-1  contains  only  Strings.  She  must  find  all  aliases  to  v  and  show  that  the  objects 
inserted  into  those  Vectors  are  Strings.  An  alias  analysis  based  on  declared  types 


1 .  These  questions  are  all  phrased  in  terms  of  object-oriented  programs,  but  similar 
questions  and  observations  apply  to  programs  written  in  C,  or  any  modem  programming 
language. 
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alone  will  imply  that  v  and  v2  are  aliases,  and  therefore  v’s  Vector  might  contain 
Integers  as  well  as  Strings.  Such  an  analysis  will  inaccurately  conclude  that  the 
downcast  to  String  might  fail. 

Researchers  have  devised  much  more  sophisticated  alias  analyses.  However,  the  fruits  of 
this  research  are  not  being  used  by  production-line  programmers.  The  motivation  for  this 
thesis  is  to  attack  this  adoption  barrier. 

Therefore  I  have  constructed  a  program  analysis  system  called  Ajax.  The  design  goals  of 
Ajax  reflect  perceived  limitations  of  previous  attempts  at  implementing  analysis  tools. 

•  Scalability 

An  analysis  that  produces  wonderfully  detailed  information  will  be  useless  if  it  is 
unable  to  handle  large  programs.  If  a  program  is  small  enough  to  be  easily  understood 
by  a  programmer,  then  the  programmer  does  not  need  an  analysis  tool. 

•  Applicability 

Many  analyses  are  not  useful  because  they  do  not  deal  well  with  features  of  modern 
programming  languages  and  modem  programs,  such  as 

•  Higher  order  control  flow  and  dynamic  method  dispatch; 

•  Ubiquitous  dynamic  memory  allocation; 

•  Large,  complex  dynamic  data  structures; 

•  Multiple  levels  of  data  encapsulation; 

•  Class  library  code  used  in  multiple  contexts 

Ajax  is  designed  to  handle  programs  written  in  a  modern  language  with  all  these  fea¬ 
tures  —  Java  —  and  is  specifically  designed  to  handle  these  features  well. 

•  Usability 

Previous  work  such  as  Lackwit  [54]  erred  by  exposing  the  results  of  analysis  very 
directly  to  the  user,  with  little  summarization  or  interpretation.  It  was  often  unclear  to  a 
normal  programmer  how  the  results  should  be  interpreted.  Therefore,  instead  of  build¬ 
ing  a  single  monolithic  tool,  Ajax  is  designed  to  be  a  platform  upon  which  a  variety  of 
tools  can  be  built,  each  addressing  a  particular  kind  of  task  or  question  that  the  pro¬ 
grammer  may  pose.  The  user  interface  to  each  tool  is  customized  for  its  particular  func¬ 
tion. 

An  additional  implied  design  goal  is  that  Ajax  must  be  powerful  enough  to  be  worth  using 
while  meeting  the  above  requirements.  At  the  least,  it  must  discover  useful  information  that 
could  not  be  obtained  by  simple  methods  based  on  local  reasoning.  This  thesis  shows  how 
Ajax  achieves  all  these  goals  simultaneously. 

1.1.4  Assumptions 

Apart  from  the  requirements  above,  the  design  of  Ajax  was  constrained  by  assumptions 
about  the  nature  of  the  solution.  These  assumptions  stemmed  from  the  background  of  this 
work,  and  have  some  independent  justification,  but  are  not  fundamental. 
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•  Sound  Static  Analysis 

Ajax  is  designed  to  produce  static  guarantees:  results  that  are  valid  for  all  possible 
inputs  and  executions  of  the  program.  Therefore  it  must  use  conservative  analysis.  For 
example,  when  finding  the  sites  of  all  method  invocations  on  a  particular  object  or  set 
of  objects,  it  only  promises  to  return  a  superset  of  the  true  sites.  One  justification  for 
using  sound  analysis  is  that  the  meaning  of  the  results  is  easier  to  define;  the  results  do 
not  need  to  be  qualified  by  the  limits  of  a  test  suite  or  the  nature  of  heuristics  used  by 
the  system.  Also,  for  some  applications,  such  as  compilation  or  automatic  transforma¬ 
tion,  it  is  intrinsically  important  that  the  results  be  sound.  However,  an  analysis  need 
not  be  sound  to  be  useful,  so  the  choice  to  explore  this  part  of  the  design  space  was  not 
a  necessary  decision. 

•  Global  Analysis 

Ajax  analyzes  whole  programs.  The  behavior  of  any  unavailable  parts  must  be  repre¬ 
sented  by  specifications.  This  is  desirable  because  behaviors  due  to  component  interac¬ 
tions  are  often  the  most  difficult  to  understand,  and  therefore  the  most  useful  to  be  able 
to  analyze  automatically.  Also,  sound  analysis  of  partial  programs  requires  some  sort  of 
description  of  the  missing  parts,  or  else  one  must  make  “worst  case”  assumptions  about 
those  parts.  The  quality  of  the  analysis  results  is  likely  to  be  severely  degraded  by  such 
pessimistic  assumptions. 

1.1.5  Goal 

The  goal  of  this  thesis  is  to  demonstrate  that  sound,  static,  global  alias  analysis  can  be 
the  basis  for  tools  that  accurately  answer  programmers’  questions  about  real,  large 
object-oriented  programs. 

By  “accurately”,  I  mean  that  the  results  are  significantly  more  accurate  than  those  provided 
by  existing  tools. 

1.2  Approach 

Ajax  incorporates  several  key  features  to  achieve  the  above  goal. 

1.2.1  Support  For  Multiple  Tools  and  Analyses 

The  key  to  the  design  of  Ajax  is  its  division  into  tools  and  analyses.  In  Ajax,  a  tool  is  a 
component  presenting  a  single  interface  to  the  user  (typically,  a  programmer),  designed  to 
aid  the  user  in  a  specific  task  by  providing  specific  information  in  a  specific  way.  An 
analysis  is  a  component  that  produces  alias  information  to  be  consumed  by  tools.  Each 
analysis  implements  a  simple,  fixed,  and  rigorously  defined  interface,  which  presents 
aliasing  information  to  tools  in  the  form  of  an  abstraction  called  the  value-point  relation  (or 
VPR).  This  is  illustrated  in  Figure  1-2. 

This  design  has  major  benefits: 


26 


Figure  1-2.  Example  of  an  Ajax  configuration 


•  One  can  use  Ajax  to  construct  one  tool  for  each  specific  task  that  requires  alias  infor¬ 
mation.  Ajax  is  carefully  organised  so  that  each  tool  requires  little  effort  to  implement. 
In  particular,  unlike  some  other  analysis  toolkits  such  as  BANE  [28],  knowledge  of  the 
semantics  of  the  target  language  is  built  into  Ajax’s  analyses  and  does  not  have  to  be 
provided  by  the  tool. 

•  Ajax  offers  a  suite  of  different  analysis  engines.  One  can  select  an  engine  for  a  given 
problem  to  achieve  an  appropriate  tradeoff  between  accuracy  and  resource  consump¬ 
tion.  Results  show  that  the  appropriate  analysis  configuration  varies  significantly 
according  to  the  task  being  addressed.  Because  the  VPR  interface  is  fixed  and  fully 
defined,  there  are  no  fundamental  restrictions  on  combining  analyses  with  tools;  any 
tool  will  operate  correctly  with  any  analysis.  A  given  combination  may  or  may  not  give 
good  quality  results,  but  it  will  give  correct  results. 

•  Ajax  allows  composition  of  analyses.  Two  analyses  can  be  “intersected”  to  combine  the 
best  results  of  both  to  solve  a  particular  problem.  Alternatively,  one  analysis  can  be 
used  as  a  “preprocessing  step”  to  provide  information  that  will  speed  up  or  improve  the 
accuracy  of  another  analysis.  These  capabilities  are  both  crucial  to  good  performance 
and  accuracy  in  Ajax.  To  implement  composition,  an  analysis  simply  uses  the  VPR 
interface  to  consume  alias  information  produced  by  one  or  more  other  analyses.  One 
such  configuration  is  illustrated  in  Figure  1-3  below. 

Conceptually,  the  value-point  relation  is  simply  the  aliasing  relation  between  program 
variables  (and  expressions).  The  difficult  part  of  the  design  is  defining  a  concrete  interface 
connecting  tools  to  analyses  that  allows  efficient,  simple  implementations  of  both.  The 
VPR  also  generalizes  alias  analysis  to  provide  information  about  values  which  are  not 
object  references  —  e.g.,  integers.  The  details  are  explained  in  Chapter  3  and  Chapter  4. 

The  design  is  exercised  by  constructing  multiple  analysis  engines  (see  Section  1.2.3 
below),  and  tools  for  the  following  tasks: 

•  Proving  the  safety  of  Java  downcasts 
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•  Identifying  dead  code 

•  Resolving  virtual  method  calls 

•  Computing  object  models 

•  Scanning  the  program  for  accesses  to  objects  satisfying  certain  criteria 

1.2.2  Support  For  Java  Programs 

As  mentioned  above,  Ajax  is  designed  to  handle  general  Java  programs.  Java  programs 
exhibit  a  variety  of  “modern”  language  features  that  are  becoming  common: 

•  Objects  —  that  is,  inheritance,  dynamic  method  dispatch,  and  data  abstraction 

•  Extensive  use  of  class  libraries,  such  as  the  Java  standard  library  and  the  Abstract  Win¬ 
dow  Toolkit  user-interface  and  graphics  library 

•  Well-defined  semantics;  the  language  specification  defines  the  behavior  of  all  Java 
code 

•  Reflection  and  dynamic  loading;  Java  programs  can  dynamically  load  new  code  at  run¬ 
time,  and  metadata  describing  and  providing  access  to  loaded  code  and  data  is  exported 
to  the  running  program 

•  Exceptions 

•  Thread-based  concurrency 

To  simplify  the  presentation  and  implementation,  Ajax  actually  processes  Java  bytecode 
programs.  This  also  makes  it  possible  for  Ajax  to  process  programs  whose  source  code  is 
not  available. 

1.2.3  Simple  Context  Sensitive  Analysis 

To  give  significantly  more  accurate  results  than  local  analyses  such  as  those  based  on 
declared  types,  an  alias  analysis  must  be  able  to  distinguish  between  different  data  accessed 
with  the  same  variable/type  names.  In  complex  programs,  the  interesting  data  are  often 
constructed  and  accessed  through  one  or  more  levels  of  indirection.  For  example,  in  object 
oriented  programs,  patterns  such  as  constructors,  abstract  factories,  and  field  access 
methods  are  ubiquitous.  For  these  programs,  some  context  sensitive  analysis  is  required. 

The  goal  is  not  to  have  the  most  sophisticated  analysis,  but  rather  one  that  significantly 
improves  on  existing  fast  analyses  by  providing  context  sensitivity.  Therefore  I  chose  to 
base  Ajax’s  primary  analysis  on  the  simplest  analysis  with  a  high  degree  of  context  sensi¬ 
tivity:  Hindley-Milner  style  polymorphic  type  inference  [49]. 

Hindley-Milner  type  inference  is  the  basis  for  type  inference  in  Standard  ML  [50].  The 
basic  idea  of  applying  this  procedure  to  analyze  aliasing  in  Java  programs  is  to  erase  the 
declared  types  of  variables,  and  perform  type  inference  based  only  on  the  type  constraints 
induced  by  operators  used  in  the  program  code.  The  inferred  type  information  is  used  to 
resolve  aliasing  questions  in  a  similar  way  to  which  declared  type  information  is  used. 
However,  inferred  types  give  more  precise  information  than  declared  types,  because  the 
inferred  types  can  be  finer  and  their  type  system  richer,  by  virtue  of  polymorphism.  For 
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example,  in  Figure  1-1  Ajax  can  automatically  prove  that  the  Vector  v  contains  only 
Strings,  and  therefore  the  downcast  cannot  fail.  This  example  requires  context  sensitive 
analysis  (see  Section  2.2.2);  no  other  comparable  system  provides  it. 

Based  on  experiences  with  Lackwit  [54],  a  similar  system  for  analyzing  C  programs,  I 
extended  the  analysis  in  several  ways: 

•  The  addition  of  polymorphic  recursion  [42]  prevents  loss  of  polymorphism  in  the  pres¬ 
ence  of  mutually  recursive  declarations. 

•  To  better  handle  Java  objects,  the  analysis  treats  “extensible  records”  [65]  in  a  clean 
way. 

•  I  changed  some  details  of  the  theory  and  implementation  to  improve  performance  and 
better  fit  Java  programs. 

These  features  are  extensively  discussed  and  evaluated  in  this  thesis.  The  general  problem 
of  type  inference  with  polymorphic  recursion  can  be  reduced  to  the  formal  problem  of 
semiunification  [42];  for  this  reason  I  call  this  alias  analysis  engine  “SEMI”. 

I  also  implemented  a  variant  of  Rapid  Type  Analysis  [9],  an  analysis  based  on  reasoning 
about  the  declared  types  of  variables.  Figure  1-3  shows  an  example  Ajax  configuration 
using  one  instance  of  SEMI  and  two  instances  of  RTA.  This  configuration  is  explained 
further  in  Section  4.4.5  and  Section  9.6. 


1.2.4  Distinguishing  Features 

Some  unique  features  distinguish  Ajax  from  all  prior  work: 

•  The  SEMI  analysis  engine  is  the  only  engine  combining  full  support  for  the  Java  lan¬ 
guage,  context  sensitivity,  and  higher-order  control  flow  analysis. 

•  SEMI  is  the  only  analysis  engine  for  a  real  programming  language  that  provides  poly¬ 
morphic  recursion  and  also  distinguishes  different  fields  of  structures. 
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•  Ajax  is  the  only  analysis  toolkit  able  to  provide  aliasing  information  directly  to  tools  in 
a  clean,  efficient  and  analysis-independent  way. 

•  Ajax  is  the  only  system  able  to  prove  the  safety  of  Java  downcasts  related  to  generic 
data  structures  (effectively  reverse  engineering  the  type  parametricity  of  those  struc¬ 
tures). 

•  Ajax  has  the  only  object  modelling  tool  able  to  automatically  and  soundly  “split” 
classes  in  the  model. 

1.3  Contributions 

This  thesis  makes  the  following  technical  contributions: 

•  It  introduces  and  evaluates  new  techniques  for  performing  generalized  context-sensi¬ 
tive  alias  analysis  of  Java  code.  These  techniques  extend  previously  published  work  in 
several  directions. 

•  It  defines  the  value-point  relation,  and  uses  it  to  describe  a  flexible  and  general  inter¬ 
face  for  efficiently  transmitting  generalized  alias  information  from  analyses  to  tools 
and  other  analyses.  The  ideas  behind  the  value-point  relation  are  not  new,  but  the  rela¬ 
tion  has  not  previously  been  formally  specified  and  used  as  the  basis  for  an  implemen¬ 
tation.  Similarly,  the  interface  between  tools  and  analyses  formalizes  and  generalizes 
some  existing  ideas. 

•  It  demonstrates  a  variety  of  tools  that  programmers  can  use  to  analyze  Java  programs, 
including  a  tool  for  building  object  models  and  a  tool  that  proves  the  safety  of  down¬ 
casts  associated  with  the  use  of  Java  generic  containers. 

•  It  shows  how  all  the  above  contributions  are  achieved  in  the  context  of  the  full  Java  lan¬ 
guage  and  realistic  Java  programs.  This  context  imposes  some  fundamental  difficulties 
that  must  be  faced  by  any  system  for  global  static  analysis.  The  thesis  explains  the  dif¬ 
ficulties  and  how  they  are  addressed  by  Ajax. 

1.4  Thesis  Overview 

The  thesis  comprises  five  major  sections. 

The  first  section  of  the  thesis  introduces  my  work  and  places  it  in  the  context  of  other  work 
on  program  analysis  and  software  engineering.  Chapter  2  surveys  the  related  work  and 
discusses  its  relationship  to  Ajax. 

The  second  section  of  the  thesis  explains  the  architecture  of  Ajax,  in  particular  the  “value- 
point  relation”  interface  that  separates  tools  from  analyses.  In  Chapter  3, 1  introduce  the 
VPR  abstraction  and  describe  how  it  is  used  to  communicate  alias  information.  It  takes 
some  thought  to  actually  realize  this  abstraction  in  a  way  that  permits  efficient  implemen¬ 
tation;  the  resulting  interface  is  described  in  Chapter  4.  In  Chapter  5, 1  present  an  extension 
of  RTA  as  an  example  of  how  an  analysis  can  implement  the  VPR  interface. 

The  third  section  of  the  thesis  describes  Ajax’s  SEMI  analysis.  Chapter  6  formally  defines 
the  analysis  over  a  subset  of  the  Java  bytecode  language,  and  proves  that  the  analysis  is 
sound.  Perhaps  surprisingly,  the  proof  reveals  that  the  soundness  of  SEMI  does  not  depend 
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on  any  static  type  safety  properties  of  the  analyzed  program;  if  the  class  file  can  be  parsed, 
then  the  code  can  be  correctly  analyzed.  Chapter  7  describes  some  of  the  actual  implemen¬ 
tation  details,  in  particular  those  that  aim  to  improve  performance.  Unfortunately  Java  has 
some  features  that  are  hard  to  treat  with  global  static  analysis;  these  features  are  discussed 
in  Chapter  8. 

The  fourth  section  of  the  thesis  is  a  description  of  five  tools  built  using  Ajax,  along  with 
quantitative  and  qualitative  evaluations  of  those  tools  using  a  suite  of  example  programs. 
The  example  programs  —  which  include  “real-life”  programs  such  as  j  a  vac  and  some 
large  GUI  applications,  along  with  the  standard  Java  library  —  are  described  in  Chapter  9. 
Chapter  9  also  presents  quantitative  results  for  two  tools:  one  for  resolving  dynamic 
method  invocations,  and  one  for  finding  dead  code.  This  chapter  focuses  on  comparing  the 
effectiveness  of  different  analysis  engines  in  different  configurations.  In  Chapter  10  I 
present  and  evaluate  a  tool  for  checking  the  validity  of  downcasts.  Chapter  1 1  describes  the 
implementation  and  results  of  a  tool  for  producing  object  models  (similar  to  storage  shape 
graphs),  which  requires  the  use  of  multiple  VPR  queries  and  some  amount  of  post¬ 
processing.  In  Chapter  12, 1  present  “JGrep,”  a  simple  tool  with  a  variety  of  uses,  that 
simply  scans  for  certain  kinds  of  aliases  to  expressions  specified  by  the  user. 

Chapter  13  contains  the  conclusions  of  the  thesis.  In  brief,  I  have  achieved  the  main  goal  of 
the  thesis:  Ajax  performs  sound,  static,  global  alias  analysis;  provides  tools  to  answer 
programmers’  questions  using  this  information;  gives  results  significantly  more  useful  than 
those  obtainable  using  previous  systems;  and  is  practically  applicable  to  real  programs  and 
problems.  However,  I  have  identified  some  major  barriers  to  adoption  for  general  purpose, 
large  scale  programming.  One  problem  is  that  the  analysis  is  still  not  scalable  enough; 
SEMI  consumes  too  many  resources  and  seems  less  accurate  as  programs  get  larger.  More 
importantly,  most  real  Java  programs  use  language  features  —  such  as  reflection  and 
dynamic  loading  —  that  are  inherently  inimical  to  sound  global  static  analysis. 
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2  Related  Work 


2.1  Introduction 

Much  work  has  been  done  in  areas  related  to  this  thesis.  The  Ajax  analysis  engines  are 
related  to  work  on  global  flow  and  closure  analysis,  alias  analysis,  and  type  inference 
systems.  The  Ajax  tools  are  similar  to  previous  systems  for  program  understanding. 

As  discussed  in  Section  1.2.1,  Ajax  separates  analyses  from  tools.  Analyses  compute 
generalized  alias  information  about  a  program,  and  tools  consume  the  information.  Ajax  is 
the  only  toolkit  able  to  provide  alias  information  directly  to  tools  in  a  clean,  efficient  and 
analysis-independent  way. 

The  SEMI  analysis  engine  also  has  unique  properties.  It  is  designed  to  handle  real  programs 
using  modem  features  such  as  objects  and  many  levels  of  indirection.  No  other  alias 
analysis  engine  combines  context  sensitivity  and  higher-order  control  flow  analysis  with 
full  support  for  a  modern  programming  language  and  the  ability  to  handle  realistically  large 
programs.  SEMI  is  also  the  only  engine  for  any  language  which  uses  polymorphic  recursion 
and  also  distinguishes  different  fields  of  structures. 

Ajax  provides  some  unique  tools  to  demonstrate  its  power.  Its  downcast  checking  tool  is 
the  only  system  able  to  prove  the  safety  of  Java  downcasts  related  to  generic  data  structures 
(effectively  reverse  engineering  the  type  parametric ity  of  those  structures).  Ajax  also 
provides  the  only  object  modelling  tool  able  to  “split”  classes  in  the  model  both  automati¬ 
cally  and  soundly;  see  Chapter  11  for  details. 

2.2  Program  Analyses 

This  section  describes  related  work  in  program  analysis.  Section  2.2.1  explains  why  it  is 
important  to  distinguish  fundamental  analysis  techniques  from  the  particular  problems  to 
which  they  are  applied.  Sections  2.2.2  and  2.2.3  define  some  terms  useful  for  classifying 
analyses,  and  give  some  general  comments  about  interpreting  the  results  of  work  in  this 
area.  The  following  sections  describe  the  actual  related  work,  clustered  according  to  the 
characteristics  of  each  analysis  technique. 

The  final  sections  deal  with  work  that  is  not  about  specific  program  analysis  techniques. 
Section  2.2.8  covers  type  inference  for  type  checking  in  programming  languages.  Section 
2.2.9  presents  work  on  composing  analyses,  and  Section  2.2.10  compares  program  analysis 
toolkits. 

2.2.1  Distinguishing  Analysis  Techniques  from  Analysis  Problems 

The  problems  of  “flow  analysis,”  “closure  analysis,”  “higher-order  control-flow  analysis,” 
“alias  analysis,”  and  “concrete  type  inference”  are  all  closely  related,  being  attempts  to 
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automatically  and  statically  characterize  the  values  of  program  variables.  They  differ  only 
in  the  types  of  the  values  they  characterize  and  in  the  kinds  of  characterizations  they  make. 

The  same  basic  analysis  techniques  are  often  applied  to  different  problems  to  yield  appar¬ 
ently  different  solutions.  For  example,  a  closure  analysis  is  so  called  because  it  determines 
which  function  bodies  may  be  evaluated  to  by  an  expression  denoting  a  higher-order 
function.  Alias  analysis  is  so  called  because  it  determines  which  abstract  memory  locations 
may  be  evaluated  to  by  an  expression  denoting  a  pointer  value.  However,  despite  the 
different  contexts,  and  often  radically  different  presentation  styles,  the  same  techniques  can 
be  used  to  solve  both  problems.  (Some  alias  analysis  techniques  are  applicable  only  to  first- 
order  code,  limiting  their  utility  for  closure  analysis.) 

Prior  to  Ajax,  applying  an  existing  analysis  technique  to  a  new  problem  domain  often 
required  significant  effort.  For  example,  researchers  first  described  how  to  use  declared 
type  information  to  resolve  higher-order  control  flow  [22]  and  then  later  showed  how  to  use 
the  same  techniques  to  perform  general  alias  analysis  [23].  As  discussed  in  Section  1.2.1, 
Ajax  completely  separates  analyses  from  problem  contexts.  In  Ajax,  matching  an  analysis 
to  a  problem  context  is  a  simple  runtime  configuration  decision.  No  prior  work  has  this 
property. 

As  well  as  adding  useful  implementation  flexibility,  the  decoupling  of  analysis  techniques 
from  problem  contexts  makes  for  easier  comparison  of  the  underlying  techniques.  For 
example,  in  Chapter  5  I  show  that  the  two  analyses  mentioned  above,  both  based  on 
declared  types  and  superficially  similar,  are  actually  subtly  different  in  precision. 

In  this  discussion,  I  deemphasize  the  original  context  in  which  work  was  presented  and 
focus  on  underlying  techniques. 

2.2.2  Classifying  Analyses 

It  is  helpful  to  classify  analyses  according  to  whether  they  possess  “flow  sensitivity”  and/ 
or  “context  sensitivity”.  These  terms  are  used  informally  and  inconsistently  in  the  liter¬ 
ature.  I  adopt  the  following  definitions: 

•  An  analysis  is  flow  sensitive  if,  when  expressed  in  the  form  of  constraints,  it  uses  inclu¬ 
sion  (subtype  or  subset)  constraints. 

The  intuition  behind  flow  sensitivity  is  that,  considering  the  program  fragment  “if  x  then  y 
else  z”,  a  flow  sensitive  analysis  can  determine  that  the  result  is  either  y  or  z  while  still 
distinguishing  y  and  z. 

Many  authors  use  “flow  sensitive”  to  mean  that  the  analysis  may  produce  different  results 
depending  on  the  ordering  of  statements  within  a  method  or  function.  However,  with  this 
definition,  any  analysis  can  trivially  be  made  flow- sensitive  simply  by  converting  the 
program  to  single  static  assignment  form  (for  local  variables)  as  the  first  phase  of  the 
analysis.  Therefore,  such  a  definition  does  not  usefully  characterize  the  analysis  technique 
itself. 

•  An  analysis  is  context  sensitive  if,  when  expressed  in  the  form  of  constraints,  it  is  possi¬ 
ble  for  two  occurrences  of  the  same  program  variable  to  induce  equality  or  inclusion 
constraints  whose  sets  of  free  variables  are  disjoint. 
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The  intuition  behind  context  sensitivity  is  that  the  information  obtained  by  a  context 
sensitive  algorithm  will  not  necessarily  be  improved  by  duplicating  code  that  is  used 
multiple  times  in  the  analyzed  programs.  This  includes  analyses  described  as  “poly variant” 
or  “polymorphic,”  and  also  some  uses  of  intersection  types  [59]. 

Both  of  these  definitions  refer  to  data  flow  sensitivity,  i.e.,  they  describe  the  kinds  of 
constraints  used  to  approximate  data  flow  in  the  program.  I  am  not  concerned  with  control 
flow  sensitivity. 

These  crude  definitions  can  be  usefully  applied  to  most  of  the  related  work.  They  are  used 
inconsistently  in  the  literature,  and  therefore  other  authors  may  apply  them  differently. 

2.2.3  Describing  Results 

I  deliberately  emphasize  performance  demonstrated  in  practice  over  asymptotic  worst-case 
complexity.  Complexity  results  can  be  very  misleading  because  real  programs  almost 
always  have  characteristic  properties  that  prevent  them  from  triggering  the  worst-case 
behavior  of  many  algorithms  (ML  type  inference  is  the  classic  example).  Unfortunately, 
published  benchmark  results  can  also  be  misleading,  because  real  programs  almost  always 
have  properties  (such  as  internal  code  reuse)  that  are  not  exhibited  by  most  small 
benchmark  programs. 

Many  authors  report  results  in  terms  of  the  number  of  abstract  locations  associated  with 
load  or  store  operations  in  the  program  (i.e.,  sizes  of  points-to  sets).  Unfortunately,  this 
metric  is  not  very  useful,  because  the  domain  of  abstract  locations  often  varies  from 
analysis  to  analysis.  Indeed,  type  inference  analyses  do  not  directly  define  a  domain  of 
abstract  locations.  Furthermore,  it  is  not  clear  how  the  sizes  of  the  sets  relate  to  the  utility 
of  the  results.  An  analysis  that  maps  the  result  of  every  C  malloc  operation  to  the  same 
abstract  location  could  easily  produce  very  small  points-to  sets  but  be  absolutely  useless  in 
practice.  Measurements  that  relate  the  dynamic  behavior  of  a  program  to  its  static  approx¬ 
imation,  such  as  the  work  of  Grove  et  al.  [37],  are  much  more  useful. 

Many  of  the  alias  analyses  presented  below  assume  that  pointed-to  memory  locations  can 
have  only  one  outgoing  pointer,  or  in  other  words,  every  structure  can  have  only  one  field. 
For  structures  with  more  than  one  field,  the  fields  are  treated  as  one  and  not  distinguished. 
This  can  drastically  change  the  performance  characteristics  of  an  analysis,  because  it  effec¬ 
tively  reduces  program  data  structure  shape  graphs  from  branching  trees  to  linear 
sequences,  and  ensures  that  all  recursive  structures  become  pure  cycles.  This  approxi¬ 
mation  is  so  common  that  it  is  not  always  clearly  stated. 

2.2.4  Flow  Sensitive,  Context  Insensitive  Analyses 

One  area  of  analysis  where  scalability  is  often  an  explicit  goal  is  alias  analysis  and  related 
problems,  such  as  side  effect  estimation. 

Andersen  [5]  gives  a  simple  flow-sensitive  algorithm  based  on  inclusion  constraints  for 
alias  analysis  of  C  programs.  It  is  often  thought  of  as  context-sensitive,  because  passing  a 
parameter  to  a  called  procedure  is  treated  as  assignment  of  the  actual  parameter  to  the 
formal  parameter;  flow  sensitivity  ensures  that  different  actual  parameters  at  different  call 
sites  can  be  distinguished  even  when  they  map  onto  the  same  formal  parameter.  Unfortu¬ 
nately  the  result  of  a  called  procedure  is  never  handled  context  sensitively;  a  returned 
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pointer  always  maps  to  the  same  set  of  abstract  locations  regardless  of  the  calling  context. 
Thus,  if  access  to  object  fields  is  consistently  performed  through  accessor  methods  of  the 
object  (as  is  often  the  case  in  Java  programs),  Andersen’s  algorithm  is  equivalent  to 
requiring,  for  each  declared  field  of  a  class,  a  single  abstract  storage  location  that  summa¬ 
rizes  the  contents  of  every  runtime  instance  of  that  field. 

In  a  series  of  reports  [30]  [75],  Aiken  and  his  collaborators  describe  methods  for  improving 
the  performance  of  inclusion-based  analyses  such  as  Andersen’s  algorithm.  This  work  is 
almost  exclusively  aimed  at  analyzing  large  C  programs  and  does  not  consider  context 
sensitivity.  Their  work  makes  Andersen’s  algorithm  practically  applicable  to  large 
programs.  Note  however  that  even  their  most  recent  results  make  the  “one  field  per 
structure”  approximation;  this  is  especially  significant  because  their  “projection  merging” 
technique  relies  on  type  constructors  having  small  arity. 

Rountev,  Milanova  and  Ryder  [66]  extend  the  improved  algorithm  to  model  multiple  fields 
per  object,  and  apply  it  to  Java  programs.  Their  method  effectively  transforms  programs  to 
first-order  code  before  analysis,  using  declared  type  information  and  analysis  of  the  class 
hierarchy  to  determine  possible  callees  of  indirect  method  calls.  They  do  not  attempt  to 
handle  reflection  and  completely  ignore  the  effects  of  library  code;  therefore  it  is  difficult 
to  interpret  their  results.  In  particular,  the  numbers  of  methods  they  find  to  be  dead  in  their 
test  programs  are  suspiciously  large. 

A  classic  approach  to  “higher  order  control  flow  analysis”  (“CFA”)  was  presented  by 
Shivers  [71].  Heintze  [39]  introduced  set-based  analysis.  Both  of  these  techniques  can  be 
thought  of  as  methods  for  higher-order  control  flow  analysis  using  inclusion  constraints. 
Since  then,  much  work  has  been  done  to  decrease  the  time  and  space  requirements  of  these 
techniques,  especially  when  some  kind  of  context  sensitivity  is  required. 

Heintze  and  Me  Allester  [41]  describe  an  implementation  of  CFA  that  answers  certain 
questions  in  linear  time  for  programs  that  have  types  that  are  bounded  in  size.  Unfortu¬ 
nately  this  approach  cannot  be  directly  applied  to  C  and  Java  programs  because  its 
treatment  of  recursive  types  is  based  on  ML  datatypes.  If  the  entire  Object  type  were  treated 
as  one  datatype,  there  would  be  a  great  loss  of  accuracy:  it  would  be  impossible  to  distin¬ 
guish  different  fields  of  the  same  object  (other  than  scalar  fields).  This  is  because  an  ML 
datatype  has  a  fixed  pattern  of  type  recursion,  so  modelling  Object  with  a  datatype  requires 
all  fields  holding  object  references  to  have  the  same  type  as  the  containing  object.  Heintze 
and  McAllester's  analysis  uses  type  information  to  guide  its  approximations  for  dealing 
with  recursive  types,  and  in  this  case  it  will  resort  to  the  gross  approximation  mentioned 
above.  Another  problem  with  their  method  is  that  extending  it  with  some  kind  of 
poly  variance  or  polymorphism  could  lead  to  serious  performance  problems. 

Flanagan  and  Felleisen  [33]  describe  an  implementation  of  set-based  analysis  designed  to 
handle  large  programs.  It  analyzes  each  component  separately,  generating  a  collection  of 
set  constraints  that  approximate  the  behavior  of  the  component,  then  simplifying  the 
constraints.  Finally  the  sets  of  simplified  constraints  are  combined  and  solved.  This  reduces 
the  amount  of  space  required  to  analyze  an  entire  program.  The  improvement  over  the  basic 
algorithm  is  very  impressive,  but  the  largest  program  analyzed  is  18,000  lines  of  Scheme, 
so  it  is  difficult  to  draw  conclusions  about  scalability,  or  about  its  behavior  on  object 
oriented  programs. 
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DeFouw,  Grove  and  Chambers  [21]  consider  a  framework  of  “fast”  algorithms  posessing 
varying  degrees  of  flow  sensitivity  and  ranging  from  linear  to  cubic  time  complexity  in  the 
size  of  the  program.  Sudaresan  et  al.  [76]  present  new  algorithms  in  this  class,  as  do  Tip 
and  Palsberg  [80].  All  these  algorithms  could  easily  and  profitably  be  implemented  to 
produce  VPR  approximations  in  Ajax. 

2.2.5  Flow  Sensitive,  Context  Sensitive  Analyses 

Ruf  [67]  compares  two  flow- sensitive  algorithms,  one  context-sensitive  and  the  other 
context-insensitive.  The  sets  of  possible  locations  at  each  load  or  store  were  almost 
identical,  leading  him  to  conclude  that  for  those  benchmarks,  context  sensitivity  was 
worthless.  However,  he  suggests  in  the  paper  that  those  results  may  not  generalize  to  larger 
programs.  (The  largest  program  considered  was  less  than  7,000  lines  of  C.) 

A  similar  study  was  done  by  Foster  et  al.  [34];  they  conclude  that  adding  context  sensitivity 
improves  the  accuracy  of  a  flow  insensitive  analysis,  but  not  a  flow  sensitive  analysis 
(Andersen’s  algorithm).  Unfortunately  their  context-sensitive  analyses  do  not  distinguish 
memory  objects  created  by  the  same  textual  occurrence  of  “malloc”,  and  therefore  may  be 
failing  to  exploit  some  of  the  power  of  context  sensitivity  (for  example,  by  failing  to  distin¬ 
guish  instances  of  heap-allocated  abstract  data  types,  which  Lackwit  and  Ajax  are  able  to 
do).  They  observe  that  the  main  advantage  a  true  context-sensitive  algorithm  has  over  a 
flow-sensitive  algorithm  (such  as  Andersen’s  algorithm)  is  that  results  or  “out  parameters” 
of  function  calls  can  be  distinguished  in  different  contexts,  and  that  their  C  programs  do  not 
exhibit  much  of  this  kind  of  polymorphism,  functions  being  mostly  executed  for  their  side 
effects.  However,  Java  and  C++  encourage  reads  of  object  state  to  be  encapsulated  in 
accessor  methods,  so  “result  polymorphism”  is  much  more  common  in  programs  for  these 
languages. 

Ryder  and  her  collaborators  [74]  [14]  developed  a  series  of  algorithms  for  large-scale  flow- 
sensitive  alias  analysis,  and  embodied  them  in  a  toolkit.  Their  approach  is  based  on  the 
propagation  of  “points-to  sets”  encoding  the  aliasing  relationships  that  hold  at  each 
program  point.  Each  points-to  set  is  a  set  of  abstract  locations  that  a  pointer  may  be 
referring  to.  This  basic  method  is  extended  to  handle  higher-order  code  (by  dynamically 
updating  a  call  graph  and  incrementally  propagating  information  between  new  callees  and 
callers);  other  extensions  are  introduced  to  handle  structures,  exceptions  and  other  modern 
language  features.  Their  most  sophisticated  general-purpose  algorithm  which  is  also 
context-sensitive  [14]  is  only  demonstrated  on  programs  with  less  than  7,000  lines  of  C++ 
code.  (It  does  not  explicitly  handle  higher-order  code;  the  programs  are  first  reduced  to 
first-order  by  applying  class  hierarchy  analysis.)  Also,  they  have  one  abstract  location  for 
each  occurrence  of  a  call  to  “malloc”  in  the  source  code.  Therefore  this  analysis  can  never 
treat  memory  allocation  context-sensitively,  and  can  never  distinguish  instances  of  abstract 
data  types  which  are  allocated  by  a  common  constructor  function. 

Wilson  and  Lam  [84]  give  an  algorithm  for  context-sensitive,  flow-sensitive  alias  analysis 
for  C  programs  that  computes  abstractions  of  procedures,  called  “partial  transfer 
functions”,  that  depend  on  the  calling  context  but  can  often  be  reused  between  calling 
contexts  (often,  only  one  PTF  is  ever  computed  for  a  procedure).  Unfortunately,  they  only 
report  results  for  small,  mostly  numeric  applications  (no  larger  than  5,000  lines),  though 
their  results  are  excellent.  Because  their  PTFs  depend  on  the  alias  patterns  in  the  calling 
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context,  and  in  particular  depend  on  the  actual  values  of  function  pointers  passed  in  by  the 
caller,  it  is  not  clear  how  much  expensive  reanalysis  would  be  required  for  larger  programs 
with  complex  data  structures  and/or  use  of  function  pointers  (object  oriented  programs  fall 
into  this  category).  They  give  no  measurements  of  the  quality  of  the  results  of  their 
algorithm.  Also,  they  only  analyzed  C  programs  with  mostly  first-order  code. 

Cheng  and  Hwu  [16]  describe  another  PTF-based  technique  that  trades  off  accuracy  in 
exchange  for  better  scalability.  Their  system  has  been  successfully  used  as  part  of  an 
optimizing  compiler  for  the  C  SPEC  benchmarks.  According  to  my  definitions,  it  is  both 
flow  sensitive  and  context  sensitive,  but  it  does  make  a  number  of  approximations  that 
make  it  hard  to  compare  with  other  algorithms.  It  is  unclear  how  it  would  fare  on  object- 
oriented  programs. 

Plevyak’s  analysis  [63]  for  object-oriented  programs  is  based  on  “adaptive  splitting,” 
which  dynamically  adds  context  and  flow  sensitivity  when  needed  to  improve  the  accuracy 
of  the  analysis  on  some  particular  task.  The  analysis  is  used  as  the  basis  for  a  number  of 
optimizations  in  an  optimizing  compiler  for  a  Java-like  language,  ICC++.  The  analysis 
looks  promising  but,  as  is  often  the  case,  only  relative  small  programs  are  targeted  (up  to 
25,000  lines  in  later  work  [24],  which  does  not  report  absolute  performance  results)  and 
direct  comparisons  with  other  systems  are  difficult. 

Grove,  Dean,  DeFouw  and  Chambers  [37]  survey  a  number  of  algorithms  for  “call  graph 
construction”  for  object  oriented  languages.  The  algorithms  studied  include  those  of 
Palsberg  and  Schwartzbach  [60],  Oxhpj,  Palsberg  and  Schwartzbach  [56],  and  Agesen  [1], 
The  call  graph  construction  problem  is  essentially  the  same  as  higher-order  control  flow 
analysis:  identify  the  possible  targets  of  an  indirect  function  (or  procedure,  or  method) 
invocation.  They  conclude  “our  experiments  demonstrated  that  scalability  problems 
prevent  the  flow -sensitive  algorithms  from  being  applied  beyond  the  domain  of  small 
benchmark  [Cecil]  programs.”  All  of  the  context-sensitive  algorithms  they  consider  are 
also  flow-sensitive.  The  algorithms  performed  much  better  on  Java  programs,  presumably 
because  Java  is  not  as  “pure”  an  object-oriented  language  as  Cecil  and  therefore  method 
dispatches  are  less  ubiquitous. 

Their  results  show  that  for  resolving  dispatches,  adding  flow  sensitivity  makes  more 
difference  than  adding  context-sensitivity,  if  the  context-sensitive  analysis  is  also  flow 
sensitive.  Unfortunately  it  is  hard  to  compare  their  results  to  mine,  because  our  systems 
make  different  assumptions.  For  example,  we  handle  library  code  differently  —  see 
Chapter  8. 

Fahndrich  and  Aiken  [29]  describe  how  to  construct  an  interesting  analysis  framework  that 
incorporates  inclusion  constraints  and  polymorphism,  but  uses  equational  (i.e.,  flow  insen¬ 
sitive)  constraints  judiciously  to  improve  the  efficiency  of  the  algorithm,  where  loss  of 
information  is  not  as  important.  They  apply  the  framework  to  the  problem  of  inferring 
uncaught  exceptions  in  MF  programs,  but  provide  very  little  information  on  the  actual 
performance  of  their  algorithm. 
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2.2.6  Simpler  Analyses 

In  response  to  the  expense  of  applying  known  flow-sensitive  or  context-sensitive  analyses, 
researchers  have  developed  fast,  but  somewhat  crude  algorithms  for  answering  various 
program  analysis  questions,  mostly  in  the  context  of  compilation  and  optimization. 

A  classic  algorithm  for  determining  the  possible  targets  of  a  method  call  is  “class  hierarchy 
analysis.”  In  a  statically  typed  language,  it  examines  the  class  that  the  source  program 
declared  for  the  object  reference  in  a  method  call;  the  run-time  class  of  the  object  must  be 
a  subclass  of  the  declared  class,  and  so  the  possible  targets  of  the  dispatch  are  the  method 
in  the  declared  class  (if  there  is  one),  and  any  overriding  method  declarations  in  those 
subclasses  [32,  20,  both  cited  in  9].  Even  languages  such  as  Smalltalk  that  lack  a  static  type 
discipline  can  use  similar  approaches,  by  computing  the  set  of  classes  which  declare  or 
inherit  a  method  implementation  compatible  with  the  call. 

Diwan,  Moss  and  McKinley  [22]  [23]  extend  this  basic  method  with  intraprocedural  flow 
analysis  and  some  very  simple  (context  insensitive)  interprocedural  propagation  and 
handling  of  data  structures,  resulting  in  an  analysis  that  is  still  linear  in  practice.  Their 
algorithms  are  quite  effective  for  their  benchmarks,  but  the  benchmarks  are  mostly  small. 
In  their  system  for  resolving  dynamic  method  invocations  [22],  the  only  program 
(“Trestle”)  that  consists  of  more  than  20,000  lines  of  code  gives  their  second-poorest  result, 
resolving  almost  none  of  the  20%  or  so  dynamic  method  invocations  that  are  invoked  at 
monomorphic  call  sites  (i.e.,  call  sites  observed  always  to  call  the  same  method  implemen¬ 
tation  at  run-time).  Interestingly,  they  comment  that  this  program  is  the  only  one  of  their 
benchmarks  that  might  benefit  significantly  from  context  sensitivity. 

Bacon  and  Sweeney  [9]  extend  class  hierarchy  analysis  with  “Rapid  Type  Analysis,”  which 
essentially  eliminates  dead  code  and  classes  in  C++  programs,  by  starting  with  the 
assumption  that  only  “main”  is  called  and  adding  in  classes,  procedures  and  methods  as 
necessary  until  a  safe  approximation  is  reached.  The  analysis  runs  in  linear  time  and  gives 
good  results  for  many  programs,  particularly  because  stripping  out  entire  unused  classes 
can  often  improve  the  results  of  class  hierarchy  analysis.  However,  most  of  their  bench¬ 
marks  do  not  exploit  subclass  polymorphism,  and  the  benchmarks  are  mostly  small  (only 
one  has  more  than  20,000  lines  of  code).  An  interesting  lesson  from  their  work  is  that  it  is 
highly  desirable  for  an  analysis  to  ignore  code  shown  to  be  dead.  RTA  achieves  this  by 
approximating  the  set  of  live  methods  from  below;  Ajax  generalizes  this  strategy  and  uses 
it  for  all  its  analyses.  Also,  because  of  RTA’s  simplicity,  efficiency  and  effectiveness,  I 
have  used  it  as  the  basis  for  one  of  the  Ajax  analysis  engines. 

Steensgaard  [72]  applied  a  very  simple  type  inference  scheme  to  analyze  aliasing  for  C 
programs.  In  its  original  incarnation,  it  did  not  distinguish  members  of  the  same  record,  and 
it  was  context  and  flow  insensitive.  The  ability  to  distinguish  record  members  was  added 
in  later  work  [73].  In  practice,  these  schemes  scale  to  very  large  programs  with  millions  of 
lines  of  code.  Other  variations  have  been  created  which  introduce  carefully  limited  flow 
sensitivity  while  retaining  scalability  [19]. 

Heintze  [40]  describes  extensions  of  the  equivalence  results  of  Palsberg  and  O'Keefe  [58] 
that,  among  other  things,  show  the  equivalence  of  unification-based  type  inference  (i.e., 
without  subtyping)  to  a  simple  closure  analysis.  There  are  no  empirical  results,  and 
polymorphic  type  systems  are  not  treated.  The  type  system  obtained  is  very  similar  to  that 
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used  for  binding  time  analysis  by  Bondorf  and  J0rgensen  [8] .  The  analysis  is  more  powerful 
than  Steensgaard's  [72],  but  less  powerful  than  Wright  and  Cartwright's  [85]  (see  below). 

2.2.7  Flow  Insensitive,  Context  Sensitive  Analyses 

Several  researchers  have  produced  flow  insensitive,  context  sensitive  program  analyses 
based  on  the  Hindley-Milner  algorithm  for  inferring  polymorphic  types  in  languages  based 
on  lambda  calculi  [49].  This  algorithm  is  attractive  because  of  its  exceptional  simplicity,  its 
elegant  handling  of  higher-order  code  and  complex  data  structures,  and  its  proven 
scalability  in  some  contexts,  such  as  type  inference  for  ML  [50]. 

Tofte  and  Talpin's  region  inference  [81]  is  somewhat  similar  to  the  SEMI  algorithm  used 
in  Ajax,  partly  because  it  uses  polymorphic  recursion  [42].  There  are  significant  differ¬ 
ences,  however.  Their  system  is  unnecessarily  complex  (for  my  purposes)  because  it 
includes  effect  inference,  which  I  do  not  need.  On  the  other  hand,  their  treatment  of 
recursive  types  is  insufficient  for  my  needs  because  they  analyze  ML  programs  which  have 
explicit  datatype  declarations  describing  the  recursive  types.  Their  use  of  polymorphic 
recursion  is  also  limited  to  the  region  variables,  but  my  usage  is  much  more  general.  Also, 
my  work  is  in  totally  different  application  domains  from  theirs,  so  the  results  are  incompa¬ 
rable. 

Wright  and  Cartwright's  soft  typing  system  for  Scheme  [85]  handles  recursive  types, 
records,  and  polymorphism,  but  it  does  not  distinguish  different  instances  of  the  same  basic 
type,  which  is  a  fundamental  requirement  for  many  of  my  applications.  For  example,  if  two 
variables  both  refer  to  lists  of  integers,  Soft  Scheme  must  assume  that  the  references  are 
aliased. 

Lackwit  [54]  [55]  is  a  system  using  polymorphic  type  inference  to  perform  alias  analyis  of 
large  C  programs.  It  was  the  direct  predecessor  to  Ajax.  Lackwit’s  analysis  worked  well  — 
analyzing  more  than  100,000  lines  of  code  in  less  than  64MB  of  RAM  —  and  handled 
recursive  types,  structures,  and  some  uses  of  type  casting.  However  SEMI  improves  on  it 
in  several  ways,  as  discussed  in  Section  1.2.3.  Also,  the  design  of  Ajax  as  a  “tool  suite” 
stems  directly  from  the  shortcomings  of  Lackwit  as  an  “all  in  one”  tool. 

Liang  and  Harrold  [62]  constructed  a  similar  analysis  for  C  programs  by  extending  Steens- 
gaard’s  algorithm.  They  do  not  distinguish  structure  fields  or  handle  higher-order  code. 
Their  test  programs  have  less  than  25,000  lines  of  code. 

Fahndrich  et  al.  [31]  built  an  analysis  similar  to  Lackwit,  adding  polymorphic  recursion  and 
“polarity”  information  to  instantiation  constraints.  The  polarity  information  improves 
accuracy  without  much  effect  on  performance.  They  achieve  good  scalability  results  on  C 
programs,  but  their  system  is  not  discriminating  between  the  fields  of  structures,  which 
avoids  some  of  the  performance  problems  which  I  had  to  address  in  SEMI.  My  SEMI 
analysis  could  exploit  polarity  information  in  the  same  way  to  improve  its  accuracy. 

Pessaux  and  Leroy  [61]  created  an  analysis  for  finding  uncaught  exceptions  in  O’Caml 
programs.  Previous  approaches  had  used  inclusion  constraints;  they  abandoned  these  in 
favor  of  unification-based  type  inference  and  polymorphic  recursion.  They  have  some 
interesting  comments  about  the  tradeoffs  involved;  they  saw  little  degradation  in  accuracy, 
and  were  actually  able  to  increase  precision  because  the  simpler  technology  allowed  them 
to  build  a  more  complete  analysis.  Their  analysis  is  impressive;  they  can  analyze  nearly 
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20,000  lines  of  (non-object-oriented)  O’Caml  code.  Because  they  are  interested  in  recov¬ 
ering  only  the  concrete  types  of  exceptions  which  can  be  thrown,  their  analysis  and  results 
are  not  directly  comparable  with  systems  such  as  Ajax. 

There  has  been  much  recent  work  on  specialised  alias  analyses  for  Java  for  tasks  such  as 
escape  analysis  and  synchronization  removal  [17]  [10]  [11]  [83]  [4].  The  analysis  most 
similar  to  SEMI  is  Ruf  s  [69].  It  computes  similar  information  to  Ajax,  partitioning  object 
references  into  equivalence  classes  and  propagating  information  from  callees  to  callers  in 
a  context-sensitive  manner.  His  analysis  is  much  faster  than  SEMI.  This  is  partly  because 
it  is  applied  to  programs  that  have  already  been  transformed  to  be  first-order,  and  it  does 
not  support  polymorphic  recursion.  He  also  uses  several  tricks  to  improve  performance  for 
his  particular  task.  Even  when  SEMI  is  configured  to  reduce  the  program  to  first-order 
before  analysis,  and  full  polymorphic  recursion  is  disabled,  Ruf  s  analysis  is  still  much 
faster.  This  indicates  that  when  polymorphic  recursion  or  incremental  analysis  are  not 
required,  deterministic  propagation  of  summaries  along  the  call  graph  is  much  more 
efficient  than  using  a  general  incremental  constraint  system  like  SEMI.  Lackwit  used  a 
similar  single-pass  deterministic  algorithm  to  propagate  type  information  from  the  leaves 
of  the  graph  of  program  declarations  up  to  the  root,  and  it  also  seems  to  be  much  faster  than 
SEMI. 

2.2.8  Type  Inference  for  Object  Oriented  Languages 

Many  researchers  have  developed  sophisticated  type  inference  systems,  and  there  has  been 
much  recent  work  on  integrating  object-oriented  features  into  languages  with  type 
inference.  These  systems  mostly  rely  on  introducing  inclusion  (subtyping)  constraints,  and 
their  performance  is  usually  not  evaluated.  Furthermore,  as  for  the  soft  typing  system 
discussed  above,  these  inference  systems  are  oriented  towards  finding  type  errors  and  do 
not  attempt  to  distinguish  values  with  the  same  concrete  type  (e.g.,  two  integers,  or  two 
objects  with  identical  structure). 

Although  not  for  object  oriented  programs,  Henglein’s  exposition  of  type  inference  for 
polymorphic  recursion  [42]  was  a  major  influence  on  my  work  and  the  work  of  others. 

Eifrig,  Smith  and  Trifonov  [27]  give  a  rich  type  inference  system  for  languages  with  object 
oriented  features  (with  support  for  state  and  records).  There  is  no  mention  at  all  of  any 
implementation  or  its  performance. 

Palsberg  and  O'Keefe  [58]  prove  that  a  certain  simple  type  inference  system  with  recursive 
types  and  subtyping  is  equivalent  to  a  standard  closure  analysis.  Obviously  performance 
problems  exhibited  by  flow  analyses  will  carry  over  to  the  equivalent  type  inference 
systems,  unless  we  relinquish  some  expressive  power.  Context  sensitive  closure  analyses 
or  polymorphic  type  systems  are  not  treated. 

Palsberg  [57]  describes  a  type  inference  algorithm  for  Abadi  and  Cardelli's  object  calculus. 
The  algorithm  incorporates  subtype  constraints,  and  requires  0(n3)  time  in  the  worst  case 
because  it  computes  a  transitive  closure;  empirical  results  are  not  reported.  It  does  not 
incorporate  parametric  polymorphism.  Because  the  subtyping  rule  is  based  on  record 
extension  (requiring  common  fields  to  have  the  same  type),  parametric  polymorphism 
would  be  required  to  ensure  true  context  sensitivity. 
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Remy  and  Vouillon  [65]  describe  the  type  system  of  Objective  Caml,  which  provides  type 
inference  for  an  object-oriented  extension  of  ML,  without  the  use  of  subtype  constraints. 
They  use  polymorphic  row  variable  types  to  write  functions  that  are  polymorphic  over 
object  types.  (Row  variables  range  over  a  set  of  unknown  fields  and  their  types.)  They 
require  explicit  coercions  in  other  situations  (e.g.,  heterogeneous  containers).  They  can 
infer  recursive  types  in  function  and  method  signatures.  This  type  system  is  very  close  to 
the  type  system  used  by  SEMI,  except  that  because  their  source  programs  have  properly 
block-structured  declarations,  they  have  no  need  for  polymorphic  recursion.  Furthermore, 
like  Wright  and  Cartwright's  Soft  Scheme,  the  system  is  designed  to  prove  type  safety,  and 
has  none  of  the  extensions  required  to  collect  other  information.  Also,  the  language  is 
intended  to  be  class-based,  but  class  types  are  not  suitable  for  my  purposes.  In  my  system, 
the  type  inferred  for  an  object  of  class  A  may  encode  information  about  the  subclasses  of 
A  as  well,  since  the  object  could  be  one  of  those  subclasses.  This  information  is  neither 
needed  nor  allowed  in  O'Caml,  since  it  breaks  modularity  and  is  not  useful  for 
typechecking. 

Duggan  [25]  proposes  a  type  inference  procedure  for  reverse  engineering  parameterized 
types  from  Java  code.  His  system  is  significantly  more  complex  than  SEMI  and  Ajax’s 
downcast  checker,  because  it  is  construed  as  a  source-to-source  translation  from  Java  to 
“PolyJava”,  an  extension  of  Java  with  bounded  parameteric  polymorphism.  Therefore  he  is 
concerned  with  ensuring  that  the  translated  code  typechecks  and  has  the  same  semantics  as 
the  original  code.  Most  importantly,  he  has  not  implemented  the  analysis,  so  its  behavior  in 
practice  is  unknown. 

2.2.9  Composing  Analyses 

Hybrid  approaches  to  closure  analysis  and  alias  analysis  have  been  proposed,  that  combine 
traditional  flow  analysis  of  abstract  values  with  type  inference.  Ruf  [68]  and  Zhang,  Ryder 
and  Landi  [86]  [87]  suggest  similar  schemes  for  alias  analysis  that  first  apply  a  fast  type 
inference  analysis,  and  then  use  the  results  to  select  a  subset  of  the  program  to  be  analyzed 
with  a  more  expensive  flow  analysis  to  obtain  more  precise  information  for  a  certain  set  of 
values.  In  fact,  this  approach  can  actually  improve  the  accuracy  of  the  results  because 
analyses  are  often  precise  or  imprecise  in  different  ways,  and  taking  the  intersection  of  the 
results  can  be  better  than  any  single  set  of  results.  The  Ajax  framework  explicitly  supports 
this  kind  of  composition;  see  Section  4.4.5. 

2.2.10  Analysis  Toolkits 

One  of  the  strengths  of  Ajax  is  its  modular  design,  enabling  tools  for  different  tasks  to  be 
quickly  and  easily  built  using  a  simple,  powerful  abstraction  of  alias  information.  Two 
“state  of  the  art”  toolkits  for  global  static  analysis  are  BANE  [2]  and  PAF  [74]. 

BANE  [2]  provides  an  engine  for  solving  term  equality  and  set  inclusion  constraints.  It  also 
supports  Hindley-Milner  style  polymorphism  (but  not  polymorphic  recursion).  To 
implement  a  task-specific  tool  using  BANE,  the  implementor  must  create  a  front  end  to 
traverse  program  code  and  build  a  set  of  constraints  to  be  solved.  The  implementor  must 
also  create  a  “back  end”  to  interpret  the  solved  constraints  in  order  to  solve  the  problem  at 
hand.  In  particular,  the  implementor  must  determine  how  to  express  the  problem  in  the  form 
of  constraints,  and  prove  that  the  constraint  problem  corresponds  to  the  real  problem.  In 
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contrast,  an  Ajax  tool  implementor  is  provided  with  the  VPR  abstraction  of  semantic  infor¬ 
mation,  without  having  to  write  any  front  end  code,  and  without  having  to  worry  about  how 
the  information  was  produced.  In  most  cases  the  implementor’s  desired  information  can  be 
extracted  directly  from  the  VPR.  The  price  is  that  Ajax  can  only  provide  aliasing  infor¬ 
mation;  BANE  could  be  reused  in  other  contexts. 

Like  Ajax,  PAF  [74]  computes  alias  analyses  of  programs.  However,  it  does  not  provide  an 
abstract  interface  comparable  to  the  VPR.  Instead,  the  analyses  produce  “points-to  sets” 
listing,  for  each  pointer  dereference  in  the  program,  the  abstract  locations  the  pointer  could 
be  pointing  to.  For  a  tool  to  use  this  information,  it  must  encode  the  meaning  of  the  abstract 
locations;  this  is  undesirable  because  the  domain  of  abstract  locations  could  change 
depending  on  the  analysis  method  being  used.  It  is  also  undesirable  because  it  places  an 
unnecessary  burden  of  understanding  on  the  tool  implementor.  Also,  it  is  not  always 
efficient  to  explicitly  convert  analysis  results  into  points-to  sets  and  then  interpret  those 
sets;  the  points-to  sets  can  be  very  large.  The  VPR  is  designed  to  avoid  this  bottleneck. 

2.3  Software  Engineering  Tools 

2.3.1  Software  Engineering  Tools  for  Program  Understanding 

There  are  many  tools  that  address  aspects  of  the  program  understanding  task,  some  built  as 
research  projects  and  some  as  commercial  products.  Almost  exclusively,  such  tools  that 
aim  to  be  scalable  do  not  rely  on  semantics-based  analyses,  but  operate  at  the  lexical  or 
syntactic  level.  For  example,  the  products  of  Imagix  Corporation  [90]  provide  a  number  of 
different  views  and  summaries  of  program  source  code,  all  of  which  rely  on  lexical  and 
syntactic  information,  or  on  profile  information  gathered  by  running  the  program.  The  C 
Information  Abstraction  system  [15],  and  its  successors  and  many  other  similar  systems, 
essentially  treat  a  program  as  an  abstract  syntax  tree  without  assigning  meaning  to  the 
syntax  elements.  In  CIA,  this  information  is  imported  into  a  database,  and  various  relational 
queries  can  then  be  used  to  extract  useful  information.  For  example,  the  tool  could  rapidly 
locate  all  mentions  of  a  particular  field  of  a  given  structure  type.  My  work  extends  these 
ideas  by  providing  much  richer  information  about  the  semantics  of  the  program. 

Murphy  and  Notkin  developed  some  lexical  analyses  that  are  particularly  efficient  and  easy 
to  customize  [51].  Due  to  its  lexical  nature,  their  tool  can  be  more  flexible  (for  example,  it 
can  analyze  programs  written  in  multiple  languages),  and  will  be  more  efficient  in  most 
cases.  Its  strength  is  also  its  weakness.  By  operating  purely  at  the  lexical  level,  it  cannot 
address  semantic  queries  with  the  precision  or  soundness  of  semantics-based  analysis. 

The  same  researchers'  Reflection  Model  Tool  (“RMT”)  [52]  allows  the  results  of  a  static 
analysis  to  be  presented  at  a  more  abstract  level  than  the  code,  such  as  an  architecture 
diagram,  and  to  be  compared  to  the  expectations  that  the  user  has  for  that  level.  It  assumes 
that  the  result  of  the  source  code  analysis  is  a  graph,  and  produces  diagrams  to  show  how 
the  abstracted  graph  differs  from  that  expected.  RMT  is  independent  of  the  tool  used  to 
analyze  the  source  code,  and  my  tools  could  be  used  in  that  role. 

Bowdidge  and  Griswold's  “Star  Diagram”  tool  [7]  and  its  successors  aid  in  encapsulating 
abstract  data  types,  by  presenting  a  special  view  of  the  program  that  focuses  on  a  particular 
variable.  They  assume  that  there  is  a  single  variable  to  be  abstracted,  but  they  discuss 
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extending  their  method  to  operate  on  data  structures  with  multiple  instances.  They  consider 
operating  on  all  data  structures  of  a  certain  type,  but  comment  “The  potential  shortcoming 
of  this  approach  is  that  two  data  structures  of  the  same  representation  type,  particularly  two 
arrays,  might  be  used  for  sufficiently  different  purposes  that  they  are  not  really  instances  of 
the  same  type  abstraction.”  Ajax  and  SEMI  solve  this  problem. 

The  Womble  object  modelling  tool  [46]  uses  syntactic  analysis,  intraprocedural  analysis, 
heuristics  and  built-in  knowledge  of  the  Java  class  library  to  produce  object  models  [70]  of 
Java  programs.  It  is  not  sound;  its  object  models  can  fail  to  reveal  class  relationships  that 
actually  exist  in  the  program.  In  contrast,  the  Ajax  object  modelling  tool  is  sound,  and  can 
accurately  “split”  classes  without  being  given  any  special  information  other  than  the  code. 
See  Chapter  1 1  for  more  details. 

2.3.2  Semantics-based  Tools  For  Program  Understanding 

The  majority  of  work  from  the  software  engineering  community  that  tries  to  capture  truly 
semantic  information  is  concerned  with  slicing  [82]  [78]  —  that  is,  the  identification  of  a 
subset  of  a  program  that  completely  determines  the  value  of  a  given  variable  at  a  given 
program  point.  This  kind  of  information  may  be  useful  for  testing,  debugging  and  other 
applications.  Unfortunately,  most  efforts  to  date  have  failed  to  achieve  any  kind  of 
scalability  or  to  operate  on  realistic  languages  and  programs.  The  most  realistic  slicing  tool 
available  is  Grammatech’s  CodeSurfer  product  [89].  CodeSurfer  analyzes  C  programs  and 
relies  on  Andersen’s  algorithm  to  resolve  aliasing  in  order  to  compute  more  accurate 
dataflow  graphs.  My  work  shows  that  alias  information  itself  can  be  used  to  solve  several 
problems  of  interest  to  the  software  engineering  community. 

The  Anno  Domini  tool  [26]  uses  monomorphic,  unification-based  type  inference  to 
compute  “Y2K”  type  information  for  data  in  COBOL  programs.  Anno  Domini  is  a  tool 
designed  to  support  one  task  very  well.  Ajax  is  designed  to  enable  cheap  construction  of 
many  such  “domain  specific”  tools. 

2.4  Language  Semantics 

This  thesis  presents  a  soundness  proof  for  SEMI,  which  requires  specification  of  the 
semantics  of  the  source  language  —  in  this  case,  a  large  subset  of  Java  bytecode.  The 
semantics  presented  here  are  a  correction  and  simplification  of  the  work  of  Qian  [64].  In 
contrast  with  other  semantics  for  Java  bytecode,  my  semantics  are  completely  dynamic  and 
rather  “lax”.  There  are  no  static  checks,  and  the  only  run-time  checks  are  those  necessary 
to  ensure  deterministic  and  sensible  execution.  This  is  because  Ajax  is  not  concerned  with 
verifying  the  static  safety  of  Java  bytecode;  in  fact,  the  soundness  proofs  demonstrate  that 
SEMI  can  soundly  analyze  bytecode  which  violates  any  and  all  static  safety  constraints. 

However,  it  is  also  true  that  the  techniques  that  underly  Ajax,  and  SEMI  in  particular,  can 
be  useful  in  performing  static  typechecking  of  bytecode.  I  have  done  some  work  in  this  area 
[53],  but  it  is  beyond  the  scope  of  this  thesis. 
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3  The  Value-Point  Relation: 
Separating  Analyses  from 
Tools 


3.1  Overview 

The  design  of  Ajax  separates  analyses,  which  produce  alias  information,  from  tools,  which 
consume  the  information.  This  chapter  presents  a  high  level  functional  specification  of  the 
interface  between  tools  and  analyses.  Chapter  4  describes  details  of  the  interface  which 
allow  analyses  and  tools  to  work  together  efficiently. 

3.1.1  Desirability  of  Simple  Semantics 

In  previous  systems,  alias  information  is  encoded  in  formats  specific  to  the  analysis  used. 
For  example,  many  analyses  compute  “points-to”  sets.  For  a  pointer  variable  or  expression 
in  a  program,  such  an  algorithm  computes  a  static  set  of  abstract  locations;  each  abstract 
location  represents  one  or  more  real  memory  locations  that  the  variable  may  point  to  at  run 
time.  A  tool  that  interprets  points-to  sets  requires  knowledge  of  the  abstraction  mapping, 
which  varies  from  analysis  to  analysis.  Furthermore,  in  practice,  an  analysis  will  compute 
points-to  information  for  some  subset  of  the  pointer  variables  and  expressions  in  the 
program;  tools  need  to  know  exactly  which  subset,  or  be  able  to  specify  it  in  advance.  If  the 
analysis  treats  the  program  in  some  intermediate  form,  tools  need  to  understand  the  same 
format. 

This  dependence  on  details  of  specific  analyses  prevents  arbitrary  combination  of  analyses 
with  tools.  More  importantly,  it  also  increases  the  cost  of  tool  construction  even  if  only  one 
analysis  is  provided.  Tool  designers  must  understand  details  of  the  analysis,  and  this 
knowledge  must  be  encoded  in  the  tool  code. 

Therefore,  I  propose  that  an  interface  between  tools  and  analyses  should  reveal  as  little  as 
possible  of  the  mechanism  of  the  analysis.  The  specification  of  the  interface  presented  to  a 
tool,  written  out  purely  in  terms  of  the  semantics  of  the  programming  language,  should  be 
as  simple  as  possible. 

3.1.2  The  Value-Point  Relation 

The  value-point  relation  (VPR)  is  a  well-defined  abstract  property  of  Java  bytecode 
programs,  encoding  generalized  alias  information.  The  VPR  for  a  given  program  is  static; 
it  summarizes  all  possible  executions  of  the  program.  An  analysis  is  required  to  compute  a 
conservative  approximation  to  the  VPR,  that  is,  any  relation  that  includes  the  VPR. 


45 


The  VPR  is  defined  directly  in  terms  of  the  Java  bytecode  language  (“JBC”).  A  full  formal 
definition  would  require  complete  semantics  for  JBC,  the  definition  of  which  is  beyond  the 
scope  of  this  thesis.  Instead,  the  VPR  is  defined  in  terms  of  a  subset  language,  “Micro”  Java 
bytecode  (“MJBC”),  for  which  I  provide  complete  semantics. 

3.2  Semantics  of  the  Micro  Java  Bytecode  Language 

This  section  formally  defines  the  semantics  of  MJBC.  Both  natural  (untagged)  and  tagged 
semantics  are  given.  The  style  is  small-step  operational  semantics. 

3.2.1  Preamble 

The  MJBC  language  was  originally  based  on  Qian’s  formalization  of  a  JBC  subset  [64]. 

There  is  no  single  syntactic  entity  corresponding  to  a  “JBC  program”.  At  any  given  moment 
at  run  time,  there  is  a  set  of  class  files  that  have  been  loaded  into  the  virtual  machine.  New 
class  files  could  be  added  at  any  time,  for  example,  from  a  user-specified  location  in  the 
Internet.  To  avoid  issues  of  unknown  code  and  dynamic  loading,  the  MJBC  semantics 
assume  that  the  set  of  class  files  is  fixed  and  that  this  set  constitutes  the  entire  program.  I 
abstract  away  the  class  file  format  and  the  linkage  process,  and  consider  a  program  to  be  a 
tuple  of  sets  and  functions  representing  the  information  in  the  class  files  after  parsing  and 
linking. 

These  sets  and  functions  are  described  in  terms  of  some  basic  types: 

•  Classldentifier,  the  type  of  abstract  names  for  classes. 

•  Methodldentifier,  the  type  of  abstract  names  for  methods. 

•  Fie Idf deni ifier,  the  type  of  abstract  names  for  fields. 

In  the  Java  Virtual  Machine,  a  Classldentifier  corresponds  to  a  fully  qualified  class  name 
paired  with  a  reference  to  the  class  loader  that  loaded  it.  A  Methodldentifier  corresponds  to 
a  method  signature  including  a  method  name,  a  return  type  and  a  list  of  parameter  types 
(because  overloading  is  resolved  at  compile  time).  A  Fieldldentifier  corresponds  to  the 
name  of  a  field  paired  with  the  class  in  which  it  was  declared  —  an  object  can  have  multiple 
fields  of  the  same  name,  inherited  from  different  classes. 

Classldentifier  has  a  distinguished  subset  ErrorClassIDs,  representing  the  classes  of  excep¬ 
tions  thrown  by  the  runtime  system  (e.g.  OutOfMemoryError  or 
NullPointerException). 

There  are  also  some  frequently  used  compound  types: 

•  Methodlmpl  =  Classldentifier  x  Methodldentifier 

Values  of  this  type  identify  method  implementations.  The  Classldentifier  is  the  class 
that  implements  the  method,  and  the  Methodldentifier  names  the  implemented  method. 
The  following  projection  functions  are  useful: 

McthodhnplClass(c/<zs.s7/T  methodID )  =  classID 
M c t h o d I  m p  1 N a m e  ( c la ss ID .  methodID)  =  methodID 
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•  CodeLoc  =  Methodlmpl  x  Z 

This  is  the  type  of  code  locations.  The  Methodlmpl  identifies  the  method  body,  and  the 
integer  is  an  offset  within  the  method’s  code.  Only  non-negative  offsets  are  actually 
used.  The  following  projection  functions  are  useful: 

CodeLoc  Method) met/zoc/,  offset)  =  method 
CodeLocOffset (method,  offset)  =  offset 

The  addition  operator  is  overloaded  at  +:  CodeLoc  xZ->  CodeLoc  as  follows: 

(i method ,  offset)  +  disp  =  ( method ,  offset  +  disp) 

Some  of  the  runtime  structures  use  lists.  The  empty  list  is  written  as  “s”  and  list  consing  is 
written  as  For  example,  3::2::l::s  denotes  a  list  of  the  first  three  positive  integers. 

The  empty  finite  map  is  written  as  “[]”.  The  extension  of  a  finite  map  M  with  a  mapping 
from  k  to  v  is  written  “ M[k  — »  v]”. 

3.2.2  Programs 

A  program  is  a  tuple  of  several  components: 

•  Main  :  Methodlmpl 

This  is  the  identifier  of  the  method  that  starts  the  program;  it  is  the  static  method  main 
of  some  class. 

•  InitFields  :  Classldentifier  >— » (Fieldldentifier  >— » Init Value) 

This  maps  each  class  in  the  program  to  the  initial  values  of  the  fields  when  an  object  of 
that  class  is  created.  Thus  it  encodes  which  fields  are  present  in  any  given  class  as  well 
as  their  default  values  (zero  for  scalars,  null  for  object  references).  InitFields  is  not 
defined  for  classes  which  cannot  be  instantiated  (i.e.,  interfaces  or  abstract  classes). 
InitValue  is  simply  either  “0”  or  “null”;  complicated  initialization  expressions  are 
actually  executed  in  each  object’s  constructor. 

•  InitStaticFields  :  Fieldldentifier  >— » InitValue 

This  finite  map  assigns  an  initial  value  to  each  static  field  in  the  program. 

•  SubclassesOf :  Classldentifier  >— »  P(ClassIdentifier) 

This  returns  the  set  of  subclasses  of  the  class.  If  the  class  is  actually  an  interface,  its 
subinterfaces  and  the  classes  implementing  it  are  included.  The  subclass  relation  is 
reflexively  and  transitively  closed. 

•  Dispatch  :  Classldentifier  x  Methodldentifier  >— »  Methodlmpl 

This  partial  function  maps  a  class  and  a  method  signature  to  the  implementation  called 
when  the  method  is  invoked  on  an  object  of  the  given  class. 

•  Instruction  :  CodeLoc  >— » Inst 

This  maps  code  locations  to  the  instructions  at  those  locations.  The  set  of  instructions 
Inst  is  described  in  Figure  3-1.  Except  as  noted,  the  names  of  the  instructions  are  the 
same  as  the  names  of  their  counterparts  in  the  official  Java  Virtual  Machine  specifica¬ 
tion. 
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Inst  ::=aconst_null 
I  bipush  byte 
I  iadd 

I  load  index  (stands  for  aload*,  iload*  forms) 

I  store  index  (stands  for  astore*,  istore*  forms) 

I  if_cmpeq  offset  (stands  for  if_icmpeq,  if_acmpeq) 

I  goto  offset 

I  return  (stands  for  ireturn,  areturn) 

I  new  classID 
I  get f ield fieldID 
I  put  f  ield  fieldID 
I  getstati cfieldID 
I  putstati cfieldID 
I  invoke  virtual  methodID 
I  invokestatic  methodlmpl 
I  checkcast  classID 
I  instanceof  classID 
I  athrow 

Figure  3-1.  The  Micro  Java  Bytecode  instruction  set 

•  CatchBlockOffset :  CodeLoc  x  Classldentifier  >— »  Z 

This  partial  function  gives  the  code  offset  of  the  handler  invoked  when  an  exception  of 
a  given  class  is  thrown  at  a  specified  program  point.  It  is  undefined  if  the  exception 
should  be  propagated  to  the  calling  method.  This  function  is  computed  from  “catch 
region”  information  stored  in  the  class  files. 

The  instruction  aeons  t_null  pushes  a  null  reference  onto  the  working  stack.  The 
bipush  instruction  pushes  an  integer  constant  onto  the  stack.  The  iadd  instruction  pops 
to  integers  off  the  working  stack,  adds  them,  and  pushes  the  result  back  onto  the  stack.  The 
load  and  store  instructions  are  used  to  move  values  between  the  local  variable  file  and 
the  working  stack.  The  instruction  i  f_cmpeq  branches  if  the  top  of  the  stack  is  zero.  The 
goto  instruction  transfers  control  to  another  instruction  within  a  method.  Programs  use  the 
return  instruction  to  terminate  the  invocation  of  the  current  method  and  return  a  value  to 
the  caller.  The  new  instruction  creates  a  new  object  instance  of  the  given  class.  The 
get  field  and  put  field  instructions  read  and  write  the  given  field  of  the  object 
indicated  by  the  reference  on  top  of  the  working  stack.  Similar  instructions  getstatic 
and  putstatic  read  and  write  static  fields;  no  object  reference  is  required.  The 
invokevirtual  instruction  performs  a  dynamic  method  call  to  the  method  with 
signature  methodID  as  implemented  by  the  object  whose  reference  is  the  first  method 
parameter.  The  invokestatic  instruction  performs  a  static  function  call  to  the  given 
method.  Both  of  the  method  invocation  instructions  take  the  top  two  elements  of  the 
working  stack  as  the  parameters  to  the  callee  method.  The  checkcast  instruction  tests 
whether  the  object  referred  to  by  the  top  of  the  working  stack  is  a  subclass  of  the  class 
specified  in  the  instruction  (or  null);  if  it  is,  then  no  action  is  taken  and  the  object  reference 
remains  on  the  working  stack,  but  if  it  is  not  a  valid  subclass,  an  exception  is  thrown.  Alter¬ 
natively,  instanceof  performs  a  similar  check  and  then  stores  the  result  in  a  boolean 
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value  on  top  of  the  stack.  The  check  is  different  because  instanceof  returns  false  if  the 
argument  is  null.  The  athrow  instruction  raises  an  exception;  on  entry  to  the  instruction, 
the  top  of  stack  holds  a  reference  to  the  exception  object  to  be  raised. 

The  instruction  set  was  designed  to  be  an  expressive  subset  of  the  JVM  instructions,  with 
some  streamlining,  e.g.,  there  are  no  per-datatype  variants  of  load/store  instructions, 
and  all  methods  take  exactly  two  parameters.  (I  chose  two  parameters  because  the  first 
parameter  is  usually  the  this  parameter  used  for  dispatch,  and  for  completeness  it  seems 
helpful  to  have  another  parameter  that  is  not  used  for  dispatch.)  Almost  all  the  interesting 
behaviors  of  Java  bytecode  instructions  are  captured  in  this  instruction  set,  with  the  notable 
omission  of  bytecode  subroutines,  which  are  of  no  importance  in  practice. 

MJBC  does  not  define  any  static  constraints  on  the  program  beyond  the  syntactic 
constraints  imposed  by  the  above  definitions.  In  this  respect  it  is  much  more  lenient  than 
the  JVM.  This  is  useful  because  it  shows  that  the  definitions  and  proofs  presented  in  this 
thesis  are  independent  of  any  particular  static  type  discipline  for  JVM  bytecode. 

3.2.3  State 

The  description  of  state  requires  some  additional  basic  types: 

•  ObjectReference,  the  type  of  heap  locations. 

•  NullRef,  the  type  of  the  null  reference.  There  is  just  one  value  of  this  type,  “null'’. 

The  type  of  values  is  defined  as: 

Value  =  Z  +  ObjectReference  +  NullRef 

There  is  a  natural  embedding  of  Init Value  into  Value  that  maps  0  to  the  0  in  Z,  and  maps 
null  to  the  null  in  NullRef. 

The  semantic  rules  require  some  additional  compound  types: 

•  HeapObj  =  Classldentifier  x  (Fieldldentifier  >— »  Value) 

A  heap  maps  object  references  to  values  of  this  type.  Heap  objects  retain  their  dynamic 
class  (used  to  dispatch  virtual  methods),  and  the  current  values  of  their  fields.  The  fol¬ 
lowing  projection  functions  are  useful: 

•  HeapObjClass  (classID,  fields)  =  classlD 

•  HeapObjFields  (classlD,  fields)  =  fields 

•  StackFrame  =  CodeLoc  x  Value  list  x(Z^  Value) 

A  tuple  of  the  form  (pc,  S,  A)  represents  the  saved  state  of  a  calling  method. 

•  pc  is  the  location  of  the  method  call  instruction  that  transferred  control  to  the  callee. 

•  A  is  the  saved  local  variables  of  the  calling  method,  defined  below. 

•  S  is  the  saved  working  stack  of  the  calling  method,  defined  below. 

A  program  state  S  is  a  record  of  the  form 

[mode:  mode ,  pc:  pc,  wstack:  S,  locals:  A,  mstack:  f,  heap:  "k¥,  globals:  (f\ 
where 
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•  mode  g  {  Running,  Throwing  } 

Throwing  indicates  that  the  program  is  in  the  process  of  throwing  an  exception. 

•  pc  :  CodeLoc 

This  is  the  location  of  the  next  instruction  to  be  executed. 

•  S  :  Value  list 

The  working  stack  is  used  to  evaluate  expressions,  and  is  local  to  the  currently  execut¬ 
ing  method.  When  an  exception  is  being  thrown,  the  stack  contains  a  single  element  — 
a  reference  to  the  exception  object  being  thrown. 

•  Value 

The  local  variable  file  is  a  finite  map  recording  the  state  of  the  local  variables.  In  JBC 
and  MJBC,  local  variables  are  numbered,  not  named.  In  MJBC  all  methods  take  two 
parameters,  so  on  entry  to  a  method,  A  has  mappings  for  local  variables  0  and  1,  hold¬ 
ing  the  actual  values  of  the  parameters. 

•  $ :  StackFrame  list 

This  is  the  method  invocation  stack,  recording  the  saved  state  of  the  methods  above  the 
currently  executing  method  in  the  call  stack. 

•  :  ObjectReference  i— »  HeapObj 

The  heap  is  a  finite  partial  map  from  object  references  to  the  stored  objects. 

•  ^ :  Fieldldentifier  >— »  Value 

The  globals  are  a  finite  map  from  each  static  field  (i.e.,  global  variable)  to  its  value. 

To  make  semantic  rules  shorter  and  more  readable,  state  records  are  written  in  the  form 
[elem\  — »  value  j,  ...,  elemn  — »  valuen,  p] 

where  p  is  a  variable  denoting  arbitrary  values  for  the  additional  elements.  However, 
whenever  the  element  mode  is  given  a  value  by  p,  then  the  value  is  required  to  be  RUNNING; 
this  is  convenient  because  most  patterns  matching  a  state  record  are  only  applicable  when 
the  machine  is  in  the  RUNNING  state. 

3.2.4  Initial  State 

The  initial  state  is 

[mode:  Running,  pc:  (Main,  0),  wstack:  8,  locals:  [],  mstack:  8,  heap:  [], 
globals:  InitStaticFields] 

MJBC  does  not  define  any  notion  of  termination;  it  is  not  needed  for  the  purposes  of  this 
thesis. 

3.2.5  Transition  Rules 

The  transition  relation  is  a  relation  over  states.  It  contains  an  element  Sj  =>  S2  if  and  only 
if  in  one  step,  the  program  in  state  Sj  can  progress  to  state  S2. 

In  general  a  given  state  Sj  can  transition  to  more  than  one  possible  S2,  because  certain 
exceptions  can  be  “spontaneously”  raised  at  any  time,  by  transition  rule  (21).  (In  the  Java 
Virtual  Machine,  such  exceptions  can  occur  when  the  virtual  machine  runs  out  of  memory 
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or  encounters  some  other  kind  of  critical  error.)  When  a  program  encounters  a  runtime  error 
(e.g.,  it  tries  to  pop  an  empty  stack),  no  normal  transition  is  possible.  However,  the  program 
is  never  “stuck”  because  it  can  always  make  a  transition  by  raising  a  spontaneous 
exception.  This  models  the  raising  of  exceptions  in  response  to  runtime  errors  —  both 
errors  that  would  normally  caught  by  static  checks,  and  errors  that  cannot  be  caught  stati¬ 
cally  such  as  failed  checkcast  instructions  throwing  a  ClassCastException. 

The  transition  rules  are  given  in  Figure  3-2. 

The  exception  throwing  and  handling  mechanism  requires  some  explanation.  When  an 
exception  is  thrown  (rules  (20)  and  (21)),  the  current  working  stack  is  cleared  and  a 
reference  to  the  exception  object  is  pushed  onto  it.  The  state  switches  to  THROWING  mode. 
In  Throwing  mode,  at  each  step,  control  either  transfers  to  an  exception  handler  within  the 
current  method  (rule  (22)),  or  leaves  the  current  method  to  continue  exception  throwing  at 
the  caller  (rule  (23)).  In  the  latter  case,  the  new  pc  is  the  location  of  the  method  call 
instruction,  rather  than  its  successor  as  in  the  case  of  a  normal  return.  This  is  necessary  for 
a  catch  block  enclosing  the  method  call  instruction  to  correctly  catch  the  exception.  The 
state  switches  back  to  RUNNING  mode  when  the  exception  is  caught  by  a  handler. 

3.2.6  Differences  between  JBC  and  MJBC 

The  following  features  of  full  JBC  have  been  omitted  or  abstracted  away  in  MJBC:  threads 
and  their  associated  synchronization  operations,  arrays,  scalar  types  other  than  int,  finite 
precision/finite  bit-width  arithmetic,  access  control  (via  packages,  public,  private 
and  protected),  native  methods,  the  fact  that  instructions  have  variable  lengths, 
complex  control  instructions  such  as  lookupswitch  and  tableswitch,  variations  on 
simple  instructions  such  as  wide,  instructions  with  the  same  semantics  that  vary  only  in 
the  types  of  their  arguments  (which  exist  to  aid  the  Java  bytecode  verifier),  convenience 
instructions  for  manipulating  the  stack  such  as  dup,  the  full  suite  of  arithmetic  operators, 
the  specialized  method  invocation  instructions  invokespecial  and 
invokeinterface,  methods  that  return  void,  methods  that  take  more  or  less  than  two 
parameters,  bytecode  subroutines,  the  runtime  error  exceptions  thrown  by  various  instruc¬ 
tions  (e.g.,  NullPointerException),  garbage  collection  and  finalization,  multiple 
classloaders,  details  of  the  class  file  format,  and  dynamic  loading. 

However,  it  does  have  the  stack-based  instruction  set,  local  variables,  integer  and  object 
types  (with  classes  and  interfaces),  exceptions  (both  explicitly  and  implicitly  thrown)  and 
exception  handling,  dynamic  type  checks,  and  virtual  and  static  methods  and  fields.  The 
JBC  does  not  have  constructors,  since  these  are  reduced  to  method  calls  at  the  bytecode 
level;  therefore  MJBC  does  not  have  constructors  either. 

The  features  abstracted  away  in  MJBC  to  simplify  the  formal  presentation  are  still  handled 
by  the  Ajax  implementation.  Most  of  the  features  are  straightforward.  Chapter  8  discusses 
issues  related  to  native  code  and  dynamic  loading. 

The  Java  Virtual  Machine  calls  the  finalize  ( )  methods  on  objects  as  they  are  garbage 
collected.  This  can  happen  at  any  time  after  the  object  becomes  garbage.  Ajax  models  this 
asacallto  finalize  ( )  on  every  object  that  can  happen  at  any  time.  This  is  slightly  more 
general  than  the  actual  behavior,  but  none  of  the  implemented  or  contemplated  analyses 
would  be  sensitive  enough  to  detect  the  difference. 
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Instruction  (pc)  =  aconst  null 

- - -  (1) 

[pc:  pc,  wstack:  S,  p]  =>  [pc:  pc  +  1,  wstack:  null  ::  S,  p] 

Instruction  (pc)  =  bipush  byte 
[pc:  pc,  wstack:  S,  p]  =>  [pc:  pc  +  1,  wstack:  byte  ::  S,  p] 

Instruction(pc)  =  iadd 

-  (3) 

[pc:  pc,  wstack:  Vj  "  V2  "  S,  p]  =>  [pc:  pc  +  1,  wstack:  (Vj  +  vf)  ::  S,  p] 


Instruction(pc)  =  load  index 

-  (4) 

[pc:  pc,  wstack:  S,  locals:  A,  p]  =>  [pc:  pc  +  1,  wstack:  A(  index)  ::  S,  locals:  A,  p] 

Instruction(pc)  =  store  index 

-  (5) 

[pc:  pc,  wstack:  V  ::  S,  locals:  A,  p]  =>  [pc:  pc  +  1,  wstack:  S,  locals:  ^[index  — >  v],  p] 


Instruction(pc)  =  if_cmpeq  offset 
v  ^  0 

[pc:  pc,  wstack:  V  ::  S,  p]  =>  [pc:  pc  +  1,  wstack:  S,  p] 


(6) 


Instruction  (pc)  =  if_cmpeq  offset 
v  =  0 


[pc:  pc,  wstack:  V  ::  S,  p]  =>  [pc:  pc  +  offset,  wstack:  S,  p] 


(7) 


Instruction(pc)  =  goto  offset 
[pc:  pc,  p]  =>  [pc:  pc  +  offset,  p] 


(8) 


Instruction  (pc)  =  return 

[pc:  pc,  wstack:  V  ::  S,  locals:  A,  mstack:  (pc',  £ ,£’)  V.  f,  p] 
=>  [pc:  pc'  +  1,  wstack:  V  ::  S’ ,  locals:  A' ,  mstack:  f,  p] 


(9) 


Figure  3-2.  Rules  defining  the  transition  relation 
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Instruction  (pc)  =  new  classID 
ref  £  dom 

-  (10) 

[pc:  pc,  wstack:  S,  heap:  p] 

=>  [p  c:  pc  +  1,  wstack:  ref  w  S,  heap:  ^[ref  — »  ( classID ,  InitFiclds(c/av.v/D))],  p] 

Instruction(pc)  =  get  f  held  fieldID 
[pc:  pc,  wstack:  ref ::  S,  heap: p] 

=>  [p  c:  pc  +  1,  wstack:  H C a p O bj F i C 1  d S (^( rc/) )(/) c /t/ ID )  ::  s,  heap: p] 

Instruction  (pc)  =  put  field  fieldID 
classID  =  Heap  O  bj  C la  s  s  (^(  re f) ) 
fields  =  HcapObj  Fields)^  rc/)) 
fieldID  e  dom  InitFicldsfc/as.s//)) 

[pc:  pc,  wstack:  V  ::  ref::  S,  heap:  1&,  p] 

=>  [p  c:  pc  +  1,  wstack:  S,  heap:  ^[ref  — »  {classID ,  fields\fieklID  — »  v])],  p] 
Instruction(pc)  =  getstatic  fieldID 

[pc:  pc,  wstack:  S,  globals:  (f.,  p]  =>  [pc:  pc  +  1,  wstack:  ^{fieldID )  ::  S,  globals:  (f.,  p] 

Instruction  (pc)  =  puts  tat  ±c  fieldID 
fieldID  e  dom  <3 

- - - - -  (14) 

[pc:  pc,  wstack:  V  ::  S,  globals:  (£,  p] 

=>  [p  c:  pc  +  1,  wstack:  S,  globals:  fieldID  — >  v],  p] 

Instruction(pc)  =  invokevirtual  methodlD 

pc'  =  (Dispatch(HeapObjClass(^(v0)),  methodlD ),  0) 

-  (15) 

[pc:  pc,  wstack:  Vj  ::  Vq  ::  S,  locals:  A,  mstack:  f,  heap:  p] 

=>  [p  c:  pc' ,  wstack:  S,  locals:  [0  — >  Vq,  1  — >  Vj],  mstack:  (pc,  S,  A)  ::  f,  heap:  p] 

Figure  3-2.  Rules  defining  the  transition  relation 


53 


Instruction  (pc)  =  invokestatic  methodlmpl 
pc'  =  ( methodlmpl ,  0) 

[pc:  pc,  wstack:  Vj  "  Vq  "  S,  locals:  A,  mstack:  fj,  p] 

=>  [p  c:  pc',  wstack:  8,  locals:  [0  — >  Vq,  1  — >  Vj],  mstack:  (pc,  S,  A)  ::  f,  p] 


Instruction  (pc)  =  checkcast  classID 
ref  =  null  v  HeapObjClass(??(rc/))  e  SubclassesOf (classID) 

[pc:  pc,  wstack:  ref::  S,  heap:  1&,  p]  =>  [pc:  pc  +  1,  wstack:  ref::  S,  heap:  1&,  p] 


(17) 


Instruction  (pc)  =  instanceof  classID 
HeapObjClass (f?{refj)  e  SubclassesOf (classID) 

[pc:  pc,  wstack:  ref  V.  S,  heap:  p]  =>  [pc:  pc  +  1,  wstack:  1  ::  S,  heap:  p] 


Instruction  (pc)  =  instanceof  classID 
ref  =  null  /  HeapObjClass(^(rc/))  £  SubclassesOf (classID)  ^ 

[pc:  pc,  wstack:  ref  V.  S,  heap:  p]  =>  [pc:  pc  +  1,  wstack:  0  ::  S,  heap:  p] 

Instruction  (pc)  =  athrow 
ref  ^  null 

- - -  (20) 

[mode:  RUNNING,  pc:  pc,  wstack:  ref ::  S,  p] 

=>[mode:  THROWING,  pc:  pc,  wstack:  ref ::  8,  p] 


classID  g  ErrorClassIDs 
ref  <£  dom 

obj  =  {classID,  I n i tFiclds(c7c/.s.s//)>i ) 

[mode:  RUNNING,  pc:  pc,  wstack:  S,  heap:  p] 

=>  [mode:  THROWING,  pc:  pc,  wstack:  ref::  8,  heap:  »?[re/— »  obj],  p] 


(21) 


handler  =  Catch  B  lockOffsctl  ( met  hod,  offset),  HeapObjClass(^(  ref))) 


[mode:  THROWING,  pc:  {method,  offset),  wstack:  ref ::  8,  heap:  p] 

=>  [mode:  RUNNING,  pc:  {method,  handler),  wstack:  ref::  8,  heap:  p] 


(22) 


Figure  3-2.  Rules  defining  the  transition  relation 
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{{method,  offset),  H  e  apOb j  Clas  s  (#( ref) ) )  1  dom  CatchBlockOffset 

[mode:  THROWING,  pc:  pc,  wstack:  ref  V.  8,  locals:  A,  mstack:  {pc'.  S,  A')  V.  f,  heap:  p] 

=>  [mode:  THROWING,  pc:  pc' ,  wstack:  ref  V.  8,  locals:  A' ,  mstack:  f,  heap:  p] 

Figure  3-2.  Rules  defining  the  transition  relation 

The  most  significant  issue  is  threads.  Ajax  uses  the  definition  of  the  VPR  presented  here, 
but  assumes  that  a  program  state  includes  a  list  of  thread  stacks,  and  that  the  semantics  of 
JBC  include  non-deterministic  context  switching  transitions.  Handling  threads  has  no 
practical  consequences  for  the  implementation  of  Ajax,  because  the  analyses  implemented 
in  Ajax  to  date  are  oblivious  to  the  order  in  which  statements  are  executed  (as  far  as  the 
heap  is  concerned,  which  is  where  all  inter- thread  interference  occurs). 

3.3  The  Value-Point  Relation 

3.3.1  Bytecode  Expressions 

To  describe  the  properties  of  a  program,  it  is  useful  to  be  able  to  name  values  such  as  stack 
elements  and  local  variables  at  particular  program  points.  Thus  I  define  a  small  language 
of  “bytecode  expressions”,  shown  in  Figure  3-3. 


BExp 

::=  pc:  BExpPath 

BExpPath 

::=  BExpRoot  BExpFields 

BExpRoot 

stack-n 

1  local-// 

1  FieldID 

1  exn 

BExpFields 

:  :=  .  FieldID  BExpFields 

\  8 

Figure  3-3.  The  language  of  bytecode  expressions 


A  bytecode  expression  includes  a  code  location  for  context;  a  BExpRoot  designating  a  stack 
element,  local  variable,  static  field  or  currently-throwing  exception;  and  an  optional  list  of 
fields  to  be  dereferenced.  Each  FieldID  is  fully  qualified  by  the  name  of  the  class  the  field 
is  declared  in. 

Given  a  program  state,  a  bytecode  expression  can  be  evaluated  to  a  value.  An  expression 
may  not  evaluate  to  any  value  if  an  object  does  not  have  an  appropriate  field,  or  a  stack  or 
local  variable  does  not  exist,  or  the  state’s  program  counter  is  not  at  the  location  specified 
in  the  expression.  The  rules  for  evaluating  an  expression  B  in  state  S,  giving  a  partial 
judgement  of  the  form  (S,  B )  v,  are  given  in  Figure  3-4. 
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-  (24) 

([mode:  RUNNING,  pc:  pc,  wstack:  Vq  ::  ...  ::  Vn  \\  S,  p \,pc  :  stack-//)  Vn 

J(n)  =  v 

([mode:  mode,  pc:  pc,  locals:  A,  p \,pc  :  local-//)  V 


([mode:  THROWING,  pc:  pc,  wstack:  V  ::  8,  p \,pc  :  exn)  V 
(fi staticField )  =  v 

([mode:  mode ,  pc:  pc,  globals:  <f,  p \,pc  :  staticField )  V 

([mode:  mode,  pc:  pc,  heap:  p \,pc  :  exp)  U 

HcapObj  Fields!^  »))(/;>/(:/)  =  v 
([mode:  mode,  pc:  pc,  heap:  ,  p \,pc  :  exp  .  field)  V 

Figure  3-4.  Rules  defining  the  evaluation  of  bytecode  expressions 

The  rule  for  stack-//  extracts  the  // - 1  h  element  of  the  stack,  if  the  program  is  not  throwing 
an  exception.  The  rule  for  local-//  extracts  the  /z-th  local  variable;  local  variables  are 
available  whether  or  not  the  program  is  throwing  an  exception.  The  exn  expression  is 
available  only  when  the  program  is  throwing  an  exception;  the  currently  throwing 
exception  is  stored  on  the  top  of  the  stack.  The  values  of  static  fields  are  extracted  from  the 
static  field  map.  Field  dereference  expressions  first  evaluate  the  dereferenced  expression; 
if  that  returns  a  value,  then  it  is  looked  up  in  the  heap  and  the  field  of  the  resulting  object  is 
extracted. 

3.3.2  The  Value-Point  Relation 

A  trace  /  of  a  program  Pisa  sequence  of  states  <E0,  . . . ,  Ew>  such  that  E0  is  the  initial 
program  state  for  program  P,  and  VO  </  <//.  Ef_  x  =>  Sy . 

Let  tq  and  c2  be  bytecode  expressions.  Define  the  value-point  relation  ^/>  of  a  program  P 
as  follows: 

e  i  p  i?2  iff 

3  a  trace  P of  P  and  states  E,-  and  E;  in  T,  such  that  (E,,  e\)  v  and  (E/5  e2)  v  for 
some  value  v,  where  v  is  not  equal  to  null. 

Informally,  two  bytecode  expressions  are  related  if  there  is  a  common  value  v  that  both 
expressions  evaluate  to.  If  v  is  an  object  reference,  then  the  two  expressions  are  aliased. 
Such  a  v  is  called  a  witness  value. 

Null  values  are  not  permitted  as  witnesses  because  aliasing  is  only  induced  when  the  two 
expressions  refer  to  actual  objects. 


(25) 

(26) 

(27) 

(28) 
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3.4  Generalizing  Alias  Analysis  Using  Tagging 


3.4.1  Overview 

The  VPR  as  defined  above  does  not  only  relate  expressions  yielding  object  references.  It 
can  also  relate  expressions  yielding  scalar  values  (integers,  in  MJBC).  However, 
computing  a  sound  approximation  to  the  definition  above  would  require  analysis  of  arith¬ 
metic,  which  is  difficult  to  do  efficiently.  The  definition  would  also  not  be  very  useful, 
because  most  pairs  of  expressions  take  on  overlapping  ranges  of  values  (including,  e.g., 
zero). 

A  more  useful  definition  distinguishes  expressions  having  the  same  value  by  an  accident  of 
arithmetic  from  expressions  yielding  values  copied  from  some  common  source.  Concep¬ 
tually,  scalar  values  can  be  treated  as  “boxed”  and  alias  analysis  performed  on  the  box 
objects.  This  enables  tracking  of  the  propagation  and  use  of  scalar  values  as  well  as  objects. 

Formally,  we  construct  an  “instrumented”  semantics  for  MJBC  associating  labels  with 
values.  The  labels,  called  tags,  are  similar  to  object  references.  When  a  scalar  value  is 
“created”  by  using  a  constant  or  performing  arithmetic,  a  fresh  tag  is  generated  and 
associated  with  the  value  to  form  a  tagged  value.  Two  tagged  values  may  have  the  same 
actual  value  but  different  tags.  For  example,  two  expressions  may  both  evaluate  to  tagged 
values  of  zero,  but  with  different  tags,  indicating  that  the  values  were  not  obtained  from  a 
common  source. 

Tags  on  non-null  object  references  are  superfluous,  because  two  equal  object  references 
must  have  the  same  tag;  the  MJBC  semantics  never  reuse  a  heap  location  once  it  has  been 
allocated.  However,  all  values  are  tagged  for  the  sake  of  uniformity. 

3.4.2  Tagged  State 

Tags  are  drawn  from  an  infinite  uninterpreted  set,  Tag. 

Tagged  values  are  defined  as 

•  Value  =  Value  x  Tag 

The  following  projection  function  is  useful: 

•  Val (value,  tag)  =  value 

The  following  derived  types  follow  immediately: 

•  HeapObi  =  Classldentifier  x  (Fieldldentifier  >— »  Value) 

•  StackFrame  =  CodeLoc  x  Value  list  x(Z>->  Value) 

A  tagged  program  state  is  a  record  of  the  form 

[mode:  mode,  pc:  pc,  wstack:  S ,  locals:  A,  mstack:  heap:  *?(,  globals:  used:  used ] 
where 

•  mode  :  {  RUNNING,  THROWING  } 

•  pc  :  CodeLoc 
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S :  Value  list 


•  A  :  Z  i— »  Value 

•  ^ :  StackFrame  list 

•  ^  :  ObjectReference  >— »  HeapObi 

•  ^ :  Fieldldentifier  >— »  Value 

•  used :  P(Tag) 

This  part  of  the  state  records  all  the  tags  that  have  been  allocated  so  far  in  the  execution. 
This  is  used  to  help  generate  unique  fresh  tags.  This  set  is  always  finite. 

I  define  the  projection  functions  Mode,  PC,  WStack,  Locals,  Globals ,  MStack,  Heap  and 
Used  to  return  the  corresponding  component  of  a  tagged  state. 

The  initial  tagged  state  is 

[mode:  Running,  pc:  (Main,  0),  wstack:  8,  locals:  [],  mstack:  s,  heap:  [], 
globals:  InitStaticFields,  used:  range  InitialTags] 

where  InitialTags  is  any  bijection  from  the  domain  of  InitStaticFields  (the  static  fields  used 
by  the  program)  to  some  subset  of  Tag.  InitStaticFields  is  defined  to  have  the  same  domain 
as  InitStaticFields,  and 

InitStaticFields(f)  =  (InitStaticFields(/),  InitialTag(/)) 

In  other  words,  in  the  initial  state,  every  global  variable  is  initialized  to  zero  or  null,  each 
with  a  unique  tag. 

3.4.3  Tagged  Transition  Rules 

The  inference  rules  defining  the  tagged  transition  relation  are  given  in  Figure  3-5. 

These  rules  are  almost  identical  to  the  untagged  transition  rules.  There  are  two  sets  of 
differences.  Whenever  a  new  value  is  created  (by  aconst_null,  bipush,  iadd,  new, 
instanceof ,  or  a  runtime  exception  throw),  a  fresh  tag  t  is  chosen  nondeterministically 
and  associated  with  the  new  value.  Also,  whenever  the  actual  value  of  a  tagged  value  is 
required,  a  Val  projection  is  inserted. 


3.4.4  Correspondence  Between  Tagged  Semantics  and  Untagged 
Semantics 

Define  the  function  Untag  from  tagged  states  to  untagged  states  as  follows: 

Untag([mode:  mode,  pc:  pc,  wstack:  s,  locals:  A,  mstack:  Q,  heap:  “W,  globals:  used:  used]) 
=  [mode:  mode,  pc:  pc,  wstack:  Untags(S),  locals:  Untag^  t^f),  mstack: 

Untagj  (0, 

heap:  Untag[|(^).  globals:  UntagG(^)] 

In  other  words,  Untag  just  strips  off  all  the  tags  from  the  state. 

It  is  also  useful  to  define  Untagp( p)  to  untag  partial  records  p. 
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Instruction  (pc)  =  aeons  t_nu  11 
t  <£  used 

[pc:  pc,  wstack:  S,  used:  used,  p] 

==>  [pc:  pc  +  1,  wstack:  (null,  t)  ::  S,  used:  used  yj  {*},  P] 


(29) 


Instruction(pc)  =  bipush  byte 
t  <£  used 

[pc:  pc,  wstack:  S,  used:  used,  p] 

[p  c:  pc  +  1,  wstack:  {byte,  t)  ::  S,  used:  used  yj  (f),  p] 


(30) 


Instruction  (pc)  =  iadd 
t  used 

[pc:  pc,  wstack:  Vj  ::  V2  O  S,  used:  used,  p] 

==>  [pc:  pc  +  1,  wstack:  (Val(Vj)  +  Val(v9),  t )  ::  S,  used:  used  yj  {t},  p] 


(31) 


Instruction(pc)  =  load  index 

-  (32) 

[pc:  pc,  wstack:  S,  locals:  A,  p]  [pc:  pc  +  1,  wstack:  index)  ::  S,  locals:  A,  p] 

Instruction  (pc)  =  store  index 

-  (33) 

[pc:  pc,  wstack:  V  ::  S,  locals:  A,  p]  ^  [pc:  pc  +  1,  wstack:  S,  locals:  ^[index  — >  v],  p] 


Instruction  (pc)  =  if_cmpeq  offset 
Val(v)  *  0 

[pc:  pc,  wstack:  V  ::  S,  p]  dl  [pc:  pc  +  1,  wstack:  S,  p] 


(34) 


Instruction  (pc)  =  if_cmpeq  offset 
Val(v)  =  0 

[pc:  pc,  wstack:  V  ::  S,  p]  [pc:  pc  +  offset,  wstack:  S,  p] 


(35) 


Instruction  (pc)  =  goto  offset 
[pc:  pc,  p]  ^  [pc:  pc  +  offset,  p] 


Figure  3-5.  Rules  defining  the  tagged  transition  relation 
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Instruction  (pc)  =  return 

[pc:  pc,  wstack:  V  ::  S,  locals:  A,  mstack:  (pc'.  S',  ^)  ::  f,  p] 
[pc:  pc'  +  1,  wstack:  V  ::  S',  locals:  £ ,  mstack:  f_,  p] 


(37) 


Instruction  (pc)  =  new  classlD 
r  £  dom  ^ 

dom  fields  =  dom  tags  =  dom  InitFiclds(c/<uv.s7/7) 

V/e  dom  fields .  fields  (f)  =  (InitFiclds(c/<xv.s77))(/),  tagsfi)) 

If  =  ^[r  — »  (classlD,  fields)] 

(U)  cj  range  tags )  n  ».sec/  =  0 
l  t  range  tog.s 
taps  is  a  bisection 

- - - - -  (38) 

[pc:  pc,  wstack:  S,  heap:  used:  used,  p] 

[pc:  pc  +  1,  wstack:  (r,  t)  ::  S-,  heap:  "f,  used:  used  CJ  {<}  cj  range  tags,  p] 

Instruction(pc)  =  g  e  t  f  i  e  1  ci  fieldID  ^ 

[pc:  pc,  wstack:  ref  V.  S,  heap:  p] 

[p  c:  pc  +  1,  wstack:  HeapObjFieldsC^l Va\(ref)))(fieldID)  ::  S,  heap:  p] 


Instruction(pc)  =  put  f  i  e  1  ci  fieldID 
classlD  =  HcapObjClass(^(  Val  (/r/))) 
fields  =  HeapObj  Fields@(  V  al  (  ref))) 
fieldID  e  dom  InitFicldsfcta.s.s//)) 

[pc:  pc,  wstack:  v  ::  ref ::  S,  heap:  p] 

[p  c:  pc  +  1,  wstack:  S,  heap:  ^[Val(/'c/)  — »  (classlD,  fields[fieldID  — »  v])],  p] 


(40) 


Instruction(pc)  =  getstatic/ze/<7/Z)  ^ 

[pc:  /JC,  wstack:  S,  globals:  p]  [pc:  pc  +  1,  wstack:  <f(fieldID)  V.  S,  globals:  p] 

Instruction  (pc)  =  puts  tat  ic  fieldID 
fieldID  e  dom  6 

- - - - -  (42) 

[pc:  pc,  wstack:  V  ::  S,  globals:  p] 

[p  c:  pc  +  1,  wstack:  S,  globals:  ^[ fieldID  — >  v],  p] 


Figure  3-5.  Rules  defining  the  tagged  transition  relation 
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Instruction(pc)  =  invokevirtual  methoclID 
pc'  =  (Dispatch(HeapObjClass(^(Val(v0))),  methodID ),  0) 

[pc:  pc,  wstack:  Vj  ::  Vq  ::  S,  locals:  A,  mstack:  (f  heap:  p] 

[p  c:  pc',  wstack:  8,  locals:  [0  — >  Vq,  1  — >  Vj],  mstack:  (pc,  S,  £)  f,  heap: p] 


(43) 


Instruction  (pc)  =  invokestatic  methodlmpl 

pc'  =  ( methodlmpl ,  0)  ^ 

[pc:  pc,  wstack:  Vj  ::  VQ  ::  S,  locals:  A,  mstack:  f,  p] 

[p  c:  pc' ,  wstack:  8,  locals:  [0  — >  Vq,  1  — >  Vj],  mstack:  (pc,  S,  ::  f,  p] 


Instruction  (pc)  =  checkcast  classID 

Val (ref)  =  null  /  Hcap()bjClass(^(Val(/'c/)))  e  SubclassesOf (classID) 

.  .  .  .  .  .  (45) 

[pc:  pc,  wstack:  ref  V.  S,  heap:  ^ ,  p]  [pc:  pc  +  1,  wstack:  ref ::  S,  heap:  ^ ,  p] 


Instruction  (pc)  =  instanceof  classID 
HeapObjClass(^(Val(rc/)))  g  SubclassesOf (classID) 

t  <£  used 

[pc:  pc,  wstack:  ref  V.  S,  heap: used,  p] 

[pc:  pc  +  1,  wstack:  (1,  t )  8  S_,  heap:  used  CJ  U),  P] 


(46) 


Instruction  (pc)  =  instanceof  classID 
Val  (ref)  =  null  v  HeapObj  Clas  s  (*?(  Val  (  ref) ) )  i  Subc  1  assesO  f (classID) 

t  <£  used 

[pc:  pc,  wstack:  ref  V.  S,  heap:  used,  p] 

[pc:  pc  +  1,  wstack:  (0,  t)  "  S_,  heap:  “W_,  used  CJ  U),  P] 


(47) 


Instruction  (pc)  =  athrow 
Val  (ref)  ^  null 

[mode:  RUNNING,  pc:  pc,  wstack:  ref  V.  S,  p] 
[mode:  THROWING,  pc:  pc,  wstack:  ref 8,  p] 


(48) 


Figure  3-5.  Rules  defining  the  tagged  transition  relation 
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classID  g  ErrorClassIDs 
r  <£  dom  f 

dom  fields  =  dom  tags  =  dom  InitFiclds(c/<xs.s/D) 

V/g  dom  fields .  fieldsff)  =  ( I  n  i  t  F  i  c  1  d  s  (c  lass  ID )  (/) ,  tagsif) 

If  =  {classID,  fields)] 

(U)  '.,j  range  tags )  n  //.set/  =  0 
t  <£  range  tags 
tags  is  a  bijection 

[mode:  RUNNING,  pc:  pc,  wstack:  S,  heap:  f,  used,  p] 

[mode:  THROWING,  pc:  pc,  wstack:  (r,  t)  8,  heap:  ,  used  AJ  U),  P] 


(49) 


handler  =  Catch  B  lockOffsct( ( method,  offset ),  FI c a pO bj C 1  as s (^(  V al  (  rej)))) 

[mode:  THROWING,  pc:  {method,  offset),  wstack:  ref ::  8,  heap:  f,  p] 
[mode:  RUNNING,  pc:  {method,  handler),  wstack:  ref  V.  8,  heap:  f,  p] 


{{method,  offset),  FlcapQbjClass('g4  Val ( rej))))  £  dom  CatchBlockOffset 

[mode:  THROWING,  pc:  pc,  wstack:  ref  V.  8,  locals:  A,  mstack:  {pc' ,  S,  f)  ::  f,  heap:  f,  p] 
[mode:  THROWING,  pc:  pc' ,  wstack:  ref 8,  locals:  JJ,  mstack:  f,  heap:  f,  p] 


Figure  3-5.  Rules  defining  the  tagged  transition  relation 


The  following  two  lemmas  express  the  fact  that  executions  in  the  tagged  semantics  mirror 
executations  in  the  untagged  semantics. 

Lemma  3-1.  VEt,  S2.  Ej  E2  =>  Untag/Ej)  =>  Untag(S2) 

Lemma  3-2.  VSl5  S2.  Untag(Ej)  =>  E2  =>  (3S2.  Untag(S9)  =  S2aSj^  E2) 

The  proofs  are  by  case  analysis  of  the  hypothesized  transition  relation.  I  present  one  case 
for  the  proof  of  each  lemma  to  illustrate  the  form  of  the  proofs. 

Proof  of  Lemma  3-1:  Suppose  Si^S2  and  consider  the  case  in  which  the  transition  is 
justified  by  the  iadd  rule.  From  the  iadd  tagged  transition  rule, 

Ej  =  [pc:  pc,  wstack:  Vj  "  V2  ::  S,  used:  used,  p] 

S9  =  [pc:  pc  +  1,  wstack:  (Va^Vj)  +  Val(v9),  t)  ::  S,  used:  used  CJ  {T},  p] 

Instruction  (pc)  =  iadd 

Then 


UntaglEj)  =  [pc:  pc,  wstack:  Val(vj)  ::  Val(v9)  ::  Untag5(S),  Untagp(p)] 
Untag(S9)  =  [pc:  pc  +  1,  wstack:  Val(vj)  +  Val(v9)  ::  Untag5(S),  Untagp(p)] 
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Hence  Untag(Sj)  =>  Untag(S9)  as  required. 

Proof  of  Lemma  3-2:  Suppose  Untag(S1)  =>S2  and  consider  the  iadd  case. 

UntagiSj)  =  [pc:  pc ,  wstack:  v1  ::  v2  ::  S,  p] 

S9  =  [pc:  pc  +  1,  wstack:  (rq  +  V2)  ::  S,  p] 

Instruction  (pc)  =  iadd 

By  the  definition  of  Untag,  Ej  must  be  of  the  form 
Ej  =  [pc:  pc ,  wstack:  ::  W9  ::  S,  used:  used,  p'] 

where 

Valiitj)  =  v, 

Val(w9)  =  v9 
Untag^(S)  =  S 
Untagp(p')  =  p 

Now  let  t  be  any  tag  such  that  t  £  used .  Such  a  tag  always  exists  because  the  set  of  tags  is 
infinite  and  the  used  set  is  always  finite.  Set 

S9  =  [pc:  pc  +  1,  wstack:  (Val(wj)  +  Val(w9),  t )  ::  S,  used:  used  cj  {7},  p'] 

Then  Untag(S9)  =  S2  and  Ej  ^  S2 ,  as  required. 

3.4.5  Correspondence  of  Traces 

Define  Un tag T  over  traces  as  follows: 

UntagT(<S0,  . . . ,  E„>)  =  <Untag(S0),  . . . ,  Untag(E„)>. 

Lemma  3-3.  For  any  tagged  trace  T,  UntagT( 7)  is  a  trace.  Furthermore,  for  any  trace  T, 
there  is  a  tagged  trace  T  such  that  UntagT( 7)  =  7 

Proof:  The  proofs  are  by  induction  on  the  length  of  the  traces. 

Consider  a  tagged  trace  7  =  <E0,  . . .,  Ew>.  For  n  =  1,  UntagT(7)  =  <Untag(E0)>.  From  the 
definition  of  the  initial  state  E0,  it  follows  that  Untag(E0)  is  the  inital  state  for  the  untagged 
semantics,  hence  <Untag(E0)>  is  a  trace. 

For  n  >  1 ,  by  the  induction  hypothesis  <Untag(E0), . . . ,  Untag(E,;_|/)>  is  a  trace.  It  is  required 
to  prove  that  Untag(E  Untag(E;;) .  This  follows  immediately  from  — „  -  t  —n  and 

Lemma  3-1. 

Now  consider  an  untagged  trace  7  =  <E0,  . . . ,  Sw>.  For  n  =  1 ,  set  T  <E0>  to  be  the  initial 
state  for  the  tagged  semantics.  As  above,  UntagT(7)  =  <S0>  =  7. 

For  >  1,  by  the  induction  hypothesis  there  exists  a  tagged  trace  7’  =  <E0,  . . .,  Sw_j>  such 
that  <Untag(E0),  . . .,  Untag(Ew.1)>  =  <E0,  . . .,  Sw_!>.  Substituting  Untag(E;;  _  j)  =  E;; _  j 
and  E;;  _  l  =>  E;;  into  Lemma  3-2,  one  obtains  3E;;.  Untag(E;;)  =  E;;  aS)h1^  E;; . 
Setting  7  =  <E0,  . . .,  Ew>  then  gives  the  required  result. 
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3.4.6  Defining  the  VPR  Using  Tags 

Figure  3-6  defines  evaluation  of  bytecode  expressions  in  tagged  states.  The  rules  are 
analogous  to  the  rules  for  untagged  states.  The  only  significant  difference  is  that  in 
Figure  3-6,  in  the  rule  for  field  dereferences,  the  object  expression  is  evaluated  to  yield  the 
tagged  value  (?/,  t),  where  u  is  the  actual  object  reference  and  l  is  the  tag,  and  the  tag  is 
ignored. 


Figure  3-6.  Rules  defining  the  evaluation  of  bytecode  expressions  in  tagged  states 


A  tagged  trace  T  of  a  program  Pisa  sequence  of  tagged  states  <E0,  . . . ,  Ef]>  such  that  S0 
is  the  initial  program  state  for  program  P,  and  VO  <  /  <  n .  E-_  x  ^  Sy . 

Let  tq  and  e2  be  bytecode  expressions.  Define  the  value-point  relation  <r^p  of  a  program  P 
as  follows: 

e\  <r^p  e2  iff 

3  a  tagged  trace  TofP  and  tagged  states  S ,■  and  Ej  in  T,  such  that  (S e{)  "+■  (?/,  t)  and 
(Ej,  e2)  (w,  t )  for  some  tagged  value  (?/,  t),  where  u  is  not  equal  to  null. 

This  is  the  definition  actually  used  in  the  remainder  of  the  thesis,  including  the  rest  of  this 
chapter. 

3.5  Examples  of  Using  the  Value-Point  Relation 

This  section  presents  some  examples  of  extracting  useful  information  from  the  VPR. 

3.5.1  Finding  Writers  to  a  Field 

Consider  the  following  problem: 

“Given  a  program  P  and  the/ic  of  a  get  f  ield  instruction,  find  all  code  locations  pc’  of 
the  put  field  instructions  that  put  values  into  the  field  being  read.” 
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This  question  can  be  formalized  as  the  following  set  comprehension: 

{ pc’  |  3  a  trace  T  of  P  =  <E0,  . . . ,  En>. 

3p,  q,  objref  S,  veil.  S',  field,  p.  \a\(objref)  A  null  a 

Ep  =  [pc:  pc,  wstack:  objref ::  S,  p]  a  Instruction/!/*:)  =  get  field  field  a 
Eq  =  [pc:  pc’,  wstack:  veil ::  objref ::  S,  p]  a  Instructional/*:')  =  put  field  field  } 

This  set  is  equal  to 

{ pc'  |  3fleld.pcx  stack- 0  •f^p pc' :  stack- 1  a 

Instruction///*:')  =  getfield  field  a  Instruction/!/*:')  =  put  field  field  } 

The  translation  erases  all  mention  of  dynamic  properties,  summarizing  them  with  the  static 
VPR. 

3.5.2  Downcast  Checking 

Consider  the  following  problem: 

“Find  all  program  locations  pc  corresponding  to  checkcast  instructions  which  might 
fail” 

This  can  be  formulated  as 

{pc  |  3  a  trace  '/  of  l>  =  <E0,  . . . ,  Ew>.  3 \p,  objref,  S,  class,  p.  Val {objref  A  null  a 

Ep  =  [pc:  pc,  wstack:  objref  .'.  S,  heap:  p]  a 

Instructional/*:)  =  checkcast  class  a 

HeapObj Classic Val(o/y/'c/)))  f  SubclassesOf(6'/u.s.s)  } 

This  can  be  rewritten  to  use  the  value-point  relation: 

{  pc  |  3 \pc' ,  class,  class', 
pc :  stack- 0  •f^p pc' :  stack- 0  a 
Instruction/!/*:)  =  checkcast  class  a 
Instruction/!/*:'  1)  =  new  class'  a 
class'  £  SubclassesOf(6'/u.s.s) } 

In  this  example,  the  translation  is  exact;  a  downcast  is  safe  if  and  only  if  some  instruction 
creates  an  object  which  reaches  the  downcast  instruction  and  which  is  incompatible  with 
the  required  bound.  Thus,  if  the  true  value-point  relation  is  known,  the  unsafe  downcasts 
can  be  determined  precisely.  Of  course,  in  general  an  analysis  can  only  compute  an  approx¬ 
imation  to  the  true  relation. 

3.6  Properties  of  the  Value-Point  Relation 

The  VPR  is  symmetric.  It  is  not  reflexive,  because  expressions  in  dead  code  cannot  be 
related  to  anything.  It  is  not  transitive  either,  in  general.  To  see  this,  suppose  Bi  <H>/>  B2  and 
B2  <r^p  B2.  The  definition  of  the  VPR  implies  that  for  some  choice  of  variables, 

(Eh  Bj  v,  (Ep,  B2)  v,  (E/(,  B2)  u,  and  (E/,  Bf  u.  The  important  fact  is  that  it  is 
possible  for  y  to  not  equal  u  (when  Ej  A  Ef),  so  there  is  no  way  in  general  to  justify  a 
relationship  between  Bi  and  B2.  For  example,  consider  this  fragment  of  code: 
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if  (b)  {x=y;  }  else  {  x  =  z;  } 

Let  £>i  be  y,  B2  be  x  and  B2  be  z,  all  evaluated  after  this  statement.  Then  this  code  may 
execute  once  with  b  true,  inducing  B\  <r^pB2,  and  then  execute  again  with  b  false,  inducing 
B2^PB2,  but  y  need  never  equal  z. 

The  VPR  does  not  explicitly  encode  any  information  about  data  dependence  or  the 
direction  of  data  flow.  B\  •f^p  B2  means  that  B\  and  B2  can  get  the  same  value,  but  nothing 
is  revealed  about  whether  the  value  appears  at  B\  or  B2  first.  In  fact,  it  may  be  that  no  def¬ 
use  chain  leads  from  Bx  to  B2  or  vice  versa  —  they  may  both  be  at  the  end  of  def-use  chains 
leading  back  to  a  common  source.  However,  it  is  possible  to  make  inferences  about  data 
dependence  in  an  important  common  case:  when  one  of  the  Bs  corresponds  to  the  result  of 
a  value  creation  operation,  such  as  the  result  of  a  new  instruction.  In  this  case  it  is  clear  that 
the  value  originated  at  the  creation  operation.  This  seems  to  be  sufficient  for  many  appli¬ 
cations.  Defining  a  relation  representing  true  directional  data  dependence  would  require  a 
much  more  complicated  definition  than  for  the  VPR. 

The  VPR  has  limited  context  information.  For  example,  if  B j  <->p  B2  and  the  bytecode 
expressions  are  both  located  in  the  same  method,  there  is  no  way  to  determine  whether  the 
two  states  justifying  the  relationship  actually  occur  during  the  same  call  to  the  method  or 
during  different  calls  to  the  method.  For  some  applications,  such  as  alias  analysis  for  code 
motion,  the  tool  is  only  interested  in  finding  aliases  that  appear  during  the  same  call  to  a 
method,  or  even  during  the  same  iteration  of  a  loop.  Thus,  these  applications  suffer  a  loss 
of  accuracy  using  the  VPR. 

The  VPR  is  simple  and  does  not  encode  information  about  context,  or  scalar  values,  or 
control  dependence,  or  many  other  aspects  of  program  behavior  that  can  be  captured  by 
static  analysis.  However,  all  these  aspects  can  be  used  to  improve  the  accuracy  of  an  imple¬ 
mentation  of  a  VPR  analysis.  For  example,  although  the  VPR  itself  encodes  only  limited 
context  information,  SEMI  uses  context  sensitive  analysis  to  produce  a  better  VPR  approx¬ 
imation. 

The  VPR  is  undecidable.  In  general,  an  analyzer  can  only  compute  a  conservative  approx¬ 
imation  to  the  VPR.  As  stated  above,  a  conservative  approximation  is  simply  any  relation 
whose  pairs  are  a  superset  of  the  pairs  of  the  true  relation.  In  this  thesis,  I  write  an  approx¬ 
imation  relation  for  program  P  as  <-»/>. 

3.7  Extensions 

Many  tools  would  benefit  from  the  ability  to  specify  tighter  context  constraints,  such  as  the 
May  Equal  formulation  of  Boyland  and  Greenhouse  [12].  This  is  an  obvious  candidate  for 
future  work. 

Other  tools  require  slightly  different  semantics  for  the  value-point  relation.  For  example, 
for  some  applications  it  is  useful  to  consider  values  to  be  related  if  they  are  ever  compared. 
This  could  be  added  to  the  dynamic  semantics  by  having  comparisons  unify  the  tags  of  the 
operands.  Static  analyses  would  then  have  to  be  adjusted  to  compute  the  correct  relation¬ 
ships.  Ajax  has  been  adapted  to  this  task,  but  that  work  is  beyond  the  scope  of  this  thesis. 
Other  applications  require  the  computed  VPR  approximation  to  satisfy  certain  structural 
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invariants,  so  that  the  tool  can  perform  its  own  processing  efficiently.  An  example  of  this 
is  the  object  modelling  tool  in  Chapter  11. 

The  trace  T  in  the  definition  of  the  VPR  is  required  to  range  over  all  possible  executions  of 
the  program,  which  implies  that  any  truly  conservative  approximation  to  the  VPR  will  be  a 
static  analysis.  However,  if  that  requirement  is  relaxed  so  that  T  only  ranges  over  some 
given  finite  set  of  executions  (e.g.  some  actual  runs  of  the  program  that  were  recorded), 
then  the  VPR  can  be  computed  by  dynamic  analysis.  The  “dynamic  VPR”  can  be  used  by 
the  same  set  of  tools  as  the  static  version,  except  that  the  results  of  the  tools  must  be  inter¬ 
preted  more  carefully;  they  are  true  only  for  the  executions  recorded. 
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4  Efficient  Queries  over  the 
Value-Point  Relation 


4.1  Introduction 

In  the  previous  chapter,  I  defined  the  value-point  relation  as  an  abstraction  of  a  program, 
generated  by  some  analysis  and  consumed  by  some  tool.  That  discussion  focused  on  the 
mathematical  properties  of  the  relation.  In  practice,  the  analysis  cannot  simply  compute  an 
explicit  relation  and  pass  it  to  the  tool,  because  the  relation  is  infinite.  Instead,  the  tool  must 
pass  certain  parameters  to  the  analysis  indicating  which  parts  of  the  relation  must  be 
computed.  In  fact,  for  efficiency,  some  of  the  tool’s  computations  over  the  relation  often 
need  to  performed  by  the  analysis  on  the  tool’s  behalf,  in  order  to  exploit  analyis- specific 
structure.  These  computations  are  also  expressed  as  parameters  to  the  analysis. 

The  nature  of  this  parameterization  determines  which  analysis  and  tool  combinations  will 
be  efficient  in  practice.  In  this  chapter,  I  describe  the  parameters  supported  by  Ajax  and 
their  motivation.  I  also  describe  some  general  strategies  used  by  analyses  and  tools  to 
exploit  the  parameters. 

4.2  Analysis  Parameters 

The  following  sections  explain  the  issues  that  need  to  be  addressed  by  the  parameterization 
scheme,  and  how  each  issue  is  addressed  in  Ajax.  Section  4.2.5  summarizes  the  parameters. 

4.2.1  Restricting  the  Domain  of  the  Value-Point  Relation 

Any  realistic  program  admits  an  infinite  number  of  different  bytecode  expressions.  For 
example,  for  any  n  one  can  form  a  meaningful  expression  involving  a  sequence  of  n  field 
dereferences.  The  value-point  relation  is  defined  over  all  pairs  of  bytecode  expressions  — 
not  just  those  that  appear  in  the  program  —  and  therefore  the  relation  is  infinite.  In  practice, 
however,  tools  generally  only  consider  a  finite  number  of  bytecode  expressions. 

Therefore,  the  simplest  and  most  important  parameter  is  a  restriction  on  the  domain  of  the 
relation.  A  tool  restricts  the  domain  by  explicitly  specifying  two  sets  of  bytecode  expres¬ 
sions,  sources  S  and  targets  T.  The  analysis  computes  the  value -point  relation  projected 
onto  S  x  T.  Because  the  sets  are  given  explicitly,  they  must  be  finite. 

Section  3.5.1  showed  how  a  tool  could  use  the  VPR  to  find  all  writers  to  a  field.  That  tool 
would  set 

S  =  {  pc : stack-0  } 

T  =  { pc' :  stack-1  |  Instructionpjpc')  =  putt ield field  } 
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The  example  in  Section  3.5.2  determines  whether  a  field  is  always  empty.  It  uses 
S  =  {  pc :  stack- 0  .field  } 

T  =  { pc' :  stack- 0  |  Instructionp(/x:'-l)  =  new  class  v 

Instruction/,!/*:'  1)  =  instanceof  class  v 
Instruction/,!/*:'  1)  =  iadd  v 
(Instruction/,!/*:'  I)  =  bipush  n  a  n  w  0)  } 

The  downcast  checking  example  in  Section  3.5.2  would  set 

S  =  {  pc :  stack- 0  |  Instruction/,!/*:'  1)  =  new  class  } 

T  =  { pc' :  stack- 0  |  Instruction^!/*:')  =  checkcast  class  } 

Since  the  value-point  relation  is  symmetric,  the  source  and  target  sets  are  interchangeable 
at  this  point  in  the  exposition.  The  extensions  described  below  break  this  symmetry. 

4.2.2  Avoiding  Explicit  Products 

The  downcast  checking  example  shows  that,  for  some  applications,  both  the  S  and  T  sets 
are  likely  to  be  proportional  in  size  to  the  size  of  the  program.  If  the  analysis  generates  an 
explicit  projection  of  the  relation  into  S  x  T,  the  size  of  the  result  could  grow  quadratically 
in  the  size  of  the  program  —  especially  if  the  analysis  is  not  very  precise. 

However,  many  tools  postprocess  the  projected  relation  to  compute  some  final  result  that 
is  much  smaller  than  the  relation  itself.  For  example,  the  downcast  checker  computes  just 
one  bit  of  information  per  element  of  T  —  whether  or  not  the  downcast  is  safe.  Furthermore 
any  scalable  analysis  must  be  able  to  represent  its  internal  data  in  space  subquadratic  in  the 
size  of  the  program.  For  efficiency,  Ajax  maps  the  tool’s  computation  directly  onto  the 
internal  data  structures  of  the  analysis,  without  requiring  an  explicit  representation  of  the 
VPR  approximation.  Of  course  this  must  be  done  with  only  minimal  assumptions  about  the 
form  of  that  structure. 

To  this  end,  I  adapted  and  generalized  an  idea  from  Heintze  and  Me  Allester’s  work  on 
sub  transitive  control  flow  analysis  [41].  The  idea  is  to  suppose  that  the  implementation  of 
the  analysis  builds  a  directed  graph  G  with  the  following  properties: 

•  There  is  a  map  Gs  from  S  to  the  nodes  of  G. 

•  There  is  a  map  Gy  from  T  to  the  nodes  of  G. 

•  The  analysis  indicates  5  <-»y>  t  if  and  only  if  there  is  path  from  Gs(.v)  to  GT(t)  in  G. 

In  Chapter  5  and  Section  6.6  I  explain  how  such  a  graph  is  constructed  by  RTA  and  SEMI 
respectively. 

Many  tools  can  exploit  this  graph  structure.  Suppose  a  tool  needs  to  compute: 

{(t,  A[{x  e  S  |  x  <-»p  ^}])  1 1  e  T} 

where  F  is  some  function  specific  to  the  tool.  Then  if  F  satisfies  a  certain  lattice-like 
condition  described  below,  the  set  of  results  can  be  computed  by  exploiting  the  graph. 
Conceptually,  each  node  corresponding  to  a  source  s  is  first  associated  with  an  initial  value 
F[{  s  }].  These  values  are  then  propagated  along  the  graph  edges  and  merged  when  they 
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meet  at  nodes.  The  result  for  each  target  t  is  read  from  the  final  value  associated  with  the 
node  corresponding  to  t.  This  process  is  similar  in  flavor  to  dataflow  analysis. 

For  example,  consider  the  downcast  checking  tool.  Let  the  function  F  be  defined  as: 

F[{  pcx :  stack- 0 ,pc2\  stack- 0,  ...,pcn :  stack- 0  }] 

=  the  most  specific  common  superclass  of  the  classes  instantiated  at 
l>c\  l-/*2  1.  ...,pcn- 1 

Consider  the  code  in  Figure  4-1.  A  simple  dataflow  analysis  would  produce  the  graph  in 
Figure  4-2. 


static  void  main ( )  { 

Object  a  =  new  Integer () ; 

Object  b  =  new  String ( "Hello" ) ; 
Object  c  =  new  String ( "Kitty" ) ; 
Object  d; 


if  ( . . . 

■  )  i 

d  = 

=  a; 

} 

else 

{ 

d  = 

=  b; 

} 

Obj  ect 

e; 

if  (  .  .  . 

■  )  i 

e  = 

=  b; 

} 

else 

{ 

e  = 

=  c; 

} 

Obj  ect 

f  ; 

if  (  .  .  . 

■  )  i 

f  = 

=  d; 

} 

else 

{ 

f  = 

=  e; 

} 

Object  h  =  a; 
Object  i  =  e; 
( Integer) h; 

( Integer) f ; 
(String) i; 


Figure  4-1.  Example  of  Java  code  exhibiting  aliasing 


s2  (new  String) 
String 


s2  (new  String) 
String 


.s'!  (new  Integer) 
^ Integer ■ 


'Integer 
tx  (checkcast  Integer) 


String 

Obj ect  (checkcast  String) 

?2  (checkcast  Integer) 


Figure  4-2.  Example  of  an  analysis  graph  used  by  the  downcast  checking  tool 


The  downcast  checking  system  finds  three  new  instructions  in  the  program,  corresponding 
to  s j,  s2,  and  53,  and  three  checkcast  instructions,  corresponding  to  t\,  t2,  and  f3,  as 
shown.  For  each  node  N  in  the  graph,  it  computes  F  applied  to  the  set  of  the  that  reach  N. 
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This  can  be  done  efficiently  because  the  value  of  F  at  each  node  (other  than  a  source  node) 
can  be  computed  from  the  F  of  its  predecessors  in  the  graph  —  it  is  the  most  specific 
common  superclass  of  the  classes  at  the  predecessors.  The  computed  F  values  are  under¬ 
lined. 

Once  the  downcast  checker  has  determined  the  most  specific  common  superclass  of  the 
classes  of  the  objects  that  may  reach  a  given  downcast  instruction,  it  compares  that  super¬ 
class  with  the  bound  specified  in  the  checkcast  instruction.  If  the  actual  superclass  is  a 
subclass  of  the  bound  (or  equal  to  it)  then  the  cast  cannot  fail.  If  the  actual  superclass  is  not 
a  subclass  of  the  bound,  then  the  analysis  has  identified  at  least  one  class  whose  objects 
appear  to  reach  the  downcast  instruction  but  which  is  not  compatible  with  the  bound.  For 
more  details,  see  Chapter  10. 

This  approach  improves  efficiency  because  the  space  required  is  only  linear  in  the  size  of 
the  analysis’  graph,  instead  of  proportional  to  the  product  of  the  size  of  S  and  the  size  of  T. 

It  is  tempting  to  assign  semantics  to  the  graphs.  For  example,  it  seems  natural  to  interpret 
Figure  4-2  as  a  dataflow  graph,  in  which  objects  of  various  classes  flow  from  their  creation 
sites  to  the  sites  of  the  downcast  instructions,  and  the  nodes  represent  intermediate  sites  in 
def-use  chains.  This  interpretation  may  be  correct  for  some  analyses,  but  it  would  be 
mistaken  in  general.  Without  referring  to  a  specific  analysis,  all  one  can  say  about  the 
graphs  is  that  they  are  encodings  of  the  computed  VPR  approximation,  as  defined  above  — 
“.v  <r^P  t  if  and  only  if  there  is  path  from  Gs(s)  to  GT(/)  in  G”. 

4.2.3  General  Framework 

The  lattice-like  property  required  of  F  is  quite  simple.  There  must  exist  a  binary  function 
Dm  such  that,  for  any  two  sets  of  source  bytecode  expressions  P  and  Q , 

F[  \PuO\  =  Dm(F[?],F[0]) 

The  existence  of  this  merge  operator  ensures  that  the  result  of  F  can  be  constructed  incre¬ 
mentally. 

Rather  than  passing  graph  structures  from  analyses  to  tools  across  the  Ajax  interface,  Ajax 
tools  pass  their  F  functions  to  the  analyses.  This  reduces  the  burden  on  tool  implementors. 

A  tool  reveals  its  F  function  to  analyses  by  passing  in  the  following  parameters: 

•  The  type  D  of  intermediate  data  —  F’s  result  type 

•  The  merge  operator  DM\  DxD^D 

•  The  identity  DE  =  F[  { }  ] 

•  The  initial  assignment  D{ :  S  — »  D.  such  that  Dj(s')  =  F[{  5  }] 

These  parameters  fully  determine  F,  for  F  can  be  computed  as  follows: 

F[{  }]  =  De 

F [{s}u0]  =  Dm(Dj(s),  F[0]) 

The  correctness  of  this  computation  follows  from  the  lattice-like  property  of  F,  by  induction 
over  the  size  of  F’s  argument  set. 
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The  lattice-like  property  imposes  several  conditions  on  these  parameters.  In  the  proofs 
below  I  assume  that  F  is  surjective,  i.e.,  that  for  every  element  cl  of  D  there  is  a  set  P  such 
that  F[P]  =  d.  This  is  ensured  by  an  appropriate  choice  of  D. 

•  Dm  must  be  commutative: 

Dm(F[P],F[0])  =  F[ P^O]  =  F[ O^jP]  =  DM(F[0],  F[P]) 

•  Dm  must  be  associative: 

Dm(F[P],  Dm(F[0],  F[i?]))  =  F[Pu(Oui{)]  =  F[(PuO)uiJ] 

=  DM(DM(F[P],F[0]),F[i?]) 

•  Dm  must  be  idempotent: 

Dm(F[P],F[P])  =  F[PuP]  =  F  [P] 

•  De  must  be  an  identity  for  DM: 

F  [O]  =  F  [{}u0]=  Dm(F[{}],F[0])  =  DM(DE,F[0]) 

In  practice,  it  has  not  been  difficult  to  identify  the  appropriate  F  function  and  D  parameters 
for  each  tool.  In  fact,  a  small  set  of  F  functions  has  proved  to  be  sufficient  for  a  variety  of 
tools.  Many  tools  use  the  same  F  function  and  distinguish  themselves  by  varying  the  S  and 
T  sets.  Some  examples  are  shown  below  in  Section  4.3. 

4.2.4  Tool  Target  Data 

Sections  4.2.2  and  4.2.3  describe  how  analyses  compute  F-values  for  each  expression  in  the 
target  set  T.  However,  the  expressions  T  themselves  are  generally  of  no  interest  to  a  tool. 
For  example,  the  downcast  checker  is  only  interested  in  the  location  of  the  downcast 
instruction.  Therefore  each  tool  specifies  a  map  TR  associating  tool  target  data  with  each 
target  expression.  The  analysis  computes 

{{d,  F[{s  e  S  |  3t  e  T.s  <-»p  t  a  TR(i)  =  <7}])  |  d  e  range  TR} 

To  compute  a  result  for  a  given  tool  target  datum,  the  analysis  merges  the  results  for  all 
target  expressions  associated  with  the  datum. 

In  the  absence  of  tool  target  data,  most  tools  would  need  to  maintain  their  own  maps  from 
target  expressions  to  data  they  find  meaningful.  The  tool  target  data  mechanism  factors  out 
this  code  into  a  shared  module.  Target  data  are  also  useful  when  a  tool  associates  the  same 
datum  with  more  than  one  expression,  because  merging  is  automatically  performed.  The 
Ajax  live  code  detector  exploits  this  feature,  as  explained  in  Section  4.3.5  below. 

4.2.5  Summary  of  Analysis  Parameters 

This  is  the  final  list  of  parameters: 

•  A  finite  set  S  of  source  expressions 

•  A  finite  set  T  of  target  expressions 

•  A  function  F  described  by  four  parameters: 
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•  A  type  D  of  intermediate  data 

•  A  merge  operator  DM  :DxD->D  satisfying  the  conditions  of  Section  4.2.3 

•  An  identity  DE  satisfying  the  conditions  of  Section  4.2.3 

•  An  initial  assignment  Dj :  S  — »  D 

•  A  type  R  of  target  data 

•  A  tool  target  data  map  Tr:T^R 
The  analysis  defines 

F[{  }]  =  De 

F[{s}u0]  =  Dm(Dj(s),F[0]) 

The  analysis  then  computes  the  result  of  the  query: 

{(d,  F[{A  e  S  |  3t  e  T..s  <r^P  t  a  TR(i)  =  r/}])  |  d  e  range  TR} 

4.3  Examples 

4.3.1  Finding  Writers  to  a  Field 

Section  3.5.1  presents  an  example  VPR  query  to  find  which  instructions  write  values  into 
a  field.  This  query  only  needs  to  determine  which  target  expressions  are  related  to  a  given 
single  source  expression.  The  output  of  the  tool  is  a  list  of  the  locations  of  those  expres¬ 
sions. 

The  query  parameters  are  simple.  The  function  F  returns  true  if  the  input  set  is  non-empty 
(i.e.,  contains  the  source  expression)  and  false  otherwise. 

S  =  {pc:  stack-0  } 

T  =  { pc' :  stack-1  |  Instructionp(pc')  =  putt ield field  } 

D  =  {  true,  false  } 

DM(a,  b)  =  aw  b 
De  =  false 

D i(pc:  stack- 0)  =  true 

R  =  CodeLoc 

TR(pc  stack- 1)  =pc’ 

The  analysis  returns  “true”  for  the  program  locations  whose  target  expressions  are  related 
to  the  source  expression.  The  tool  prints  out  these  locations. 

4.3.2  Finding  Unused  Fields 

The  tool  discussed  in  Section  3.5.2  determines  whether  a  given  get  field  instruction 
always  returns  zero  or  null.  Consider  an  extension  of  that  tool  to  check  all  get  field 
instructions  simultaneously.  This  tool  needs  to  compute  one  bit  of  information  for  each 
getfield  instruction,  so  we  make  the  getfield  instructions  the  targets. 
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S  =  {  pc' :  stack- 0  |  Instructionp(pc'-l)  =  new  class  v 
Instruction /;(/x/  1)  =  instanceof  class  v 
Instruction /;(/x/  1)  =  iadd  v 
(Instruct on/)(/x.''  I)  =  bipush  n  a  ii  w  0)  } 

T  =  { pc :  stack-0  .field  |  Instructionp(pc)  =  getf  ield field  } 

D  =  {  true,  false  } 

Dm(<3,  b)  =  aw  b 
De  =  false 

D iipcr :  stack- 0)  =  true 
R  =  CodeLoc 

TR(pc:st  ack-0  . field)  =  pc 

Similarly  to  the  previous  example,  the  analysis  returns  “true”  for  the  locations  whose  target 
expressions  are  related  to  any  of  the  source  expressions.  These  are  the  locations  of  the 
get  field  instructions  that  might  not  return  zero  or  null.  The  tool  outputs  the  locations 
for  which  the  analysis  returns  “false”. 

4.3.3  Downcast  Checking 

These  are  the  analysis  parameters  for  the  downcast  checker: 

S  =  {  pc :  stack- 0  |  Instruct  on /)(/«.'  1)  =  new  class  } 

T  =  { pc'  \  stack- 0  |  Instructionp(pc')  =  checkcast  class  } 

D  is  the  class  lattice  fori5  (see  below) 

Dm  is  the  join  operation  in  D 
De  is  the  bottom  element  in  D 

D  fpc  \  stack- 0)  =  class,  where  Instruct  on /,(/«.'  1)  =  new  class 

R  =  CodeLoc 

TR(pc  stack-0)  =  pc’ 

The  class  lattice  for  program  P  is  P’s  Java  class  hierarchy,  including  interfaces,  extended 
to  form  a  lattice.  The  standard  class  hierarchy  does  not  form  a  lattice  for  two  reasons.  It  does 
not  have  a  “bottom”  element  to  serve  as  the  identity  for  a  join  operation,  and  therefore  we 
add  a  synthetic  bottom  element.  Also,  two  classes  may  not  have  a  unique  most  specific 
common  superclass,  such  as  classes  ClassP  and  ClassQ  in  the  hierarchy  of  Figure  4-3. 

To  complete  the  lattice,  we  add  elements  representing  the  intersections  of  sets  of  classes 
and  interfaces.  In  this  example,  the  most  specific  common  superclass  of  ClassP  and 
ClassQ  is  the  synthetic  intersection  class  “ClassAn  InterfaceB”. 

For  each  checkcast  instruction,  the  result  of  the  analysis  is  the  most  specific  common 
superclass  of  all  the  classes  of  objects  subjected  to  the  checkcast  instruction.  If  this 
superclass  is  a  subclass  (or  equal  to)  the  bound  specified  in  the  checkcast  instruction, 
then  the  downcast  is  safe,  otherwise  it  may  fail. 
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4.3.4  Method  Call  Resolution 

Consider  a  tool  designed  to  resolve  dynamic  method  calls  through  a  given  method  signature 
M.  For  each  dynamic  method  call  site,  the  tool  determines  whether  there  is  exactly  one 
possible  callee,  and  if  so,  which  method  it  is.  Dynamic  method  call  sites  with  only  one 
possible  callee  can  be  converted  into  direct  calls  by  a  compiler,  resulting  in  faster  method 
call  code  and  possible  inlining  of  the  callee. 

Because  the  tool  computes  information  for  each  call  site,  the  call  sites  are  the  targets.  (In 
general,  whenever  the  tool’s  query  can  be  phrased  in  the  form  “for  every  X,  compute  Y”, 
the  choices  for  X  determine  the  set  of  targets  T.)  At  each  site,  the  target  expression  is  the 
object  reference  upon  which  the  call  is  dispatched.  The  source  expressions  are  the  results 
of  the  new  instructions  that  create  objects  implementing  M.  By  determining  which  of  those 
sources  are  related  to  the  receiving  object  at  a  call  site,  the  call  can  be  resolved,  or  found  to 
be  unresolvable. 

Instead  of  collecting  the  complete  list  of  source  expressions  related  to  each  target,  it  is  more 
efficient  to  extract  just  the  salient  information.  We  associate  with  each  source  expression 
the  method  implementing  Min  the  new  object.  The  tool  collects  the  set  of  methods  reaching 
each  call  site. 

Observe  that  if  a  set  of  callee  methods  at  a  call  site  has  more  than  one  element,  then  the  call 
cannot  be  statically  resolved  and  the  exact  contents  of  the  set  are  not  used.  Therefore  each 
set  can  be  abstracted  to  one  of  the  following  values: 

•  The  empty  set,  indicating  that  there  is  no  receiving  object.  This  implies  that  the  call  site 
is  in  dead  code  or  the  receiving  object  reference  is  always  null. 

•  A  singleton  method,  indicating  that  there  is  at  most  one  receiving  method  implementa¬ 
tion.  The  call  site  can  be  resolved  to  the  given  method. 

•  The  value  “many”,  indicating  that  the  set  of  possible  method  implementations  may 
have  more  than  one  element.  The  call  site  cannot  be  resolved  to  a  single  method. 

This  abstraction  is  essentially  the  optimization  proposed  by  Heintze  and  Me  Allester  [41]. 

Let  Implementors P(M)  denote  the  set  of  all  methods  implementing  M.  The  tool  uses  the 
following  parameters: 


76 


S  =  {  pc :  stack- 0  |  Instruct  on /)(/«.'  1)  =  new  class  } 

T  =  { pc  stack- 1  |  Instruction^!/*!)  =  invokevirtual  M  }  (stack- 1  refers 
to  the  receiving  object  in  the  call  to  M) 

D  =  {  0,  many  }  vj  Implement orsp(M) 

Dm(0,  x)  =  Dm(.v,  0)  =  x 
DM(many,  x)  =  DM(v,  many)  =  many 
Dm(x,  x)=x 

Dm(x,  y)  =  many,  when x^y 
DE  =  0 

D{(pc :  stack- 0)  =  impl,  where  Instruct  on /,(/*:'  1)  =  “new  class ”,  and  class’s 
implementation  of  M  has  identifier  impl 
R  =  CodeLoc 
TR(/7c':st  ack-n)  =  pc’ 

The  tool  outputs  a  D  value  for  each  invokevirtual  instruction  specifying  method 
signature  M.  If  the  value  is  0,  then  the  instruction  is  never  reached.  If  the  value  is  “many”, 
then  the  instruction  cannot  be  statically  resolved.  Otherwise  the  value  is  the  name  of  the 
only  possible  callee  method. 

Section  4.4.1  describes  how  this  tool  is  extended  to  examine  all  invokevirtual 
instructions  simultaneously. 

4.3.5  Live  Code  Detection 

Consider  a  tool  to  find  the  live  implementations  of  a  given  method  signature  M.  Such  a 
“live  code  detector”  is  rather  similar  to  the  method  call  resolver  in  the  previous  section, 
because  proper  identification  of  which  methods  are  live  requires  some  resolution  of 
dynamic  method  calls.  However,  the  live  code  detector  collects  information  about  methods 
rather  than  call  sites.  Therefore  the  tool  target  data  are  the  method  implementations;  the 
result  returned  for  each  method  is  “true”  if  it  may  be  live,  or  “false”  if  it  must  be  dead.  The 
parameters  are: 

S  =  { pc' :  stack-n  |  Instruction^!/*!)  =  invokevirtual  M  },  where  n  is  the  index 
of  the  receiving  object  in  the  list  of  parameters  of  a  call  to  M 
T  =  { pc :  stack- 0  |  Instructionp(pc-l)  =  new  class  } 

D  =  {  true,  false  } 

DM(a,  b)  =  av  b 
De  =  false 

D iipc' :  stack-/?)  =  true 
R  =  CodeLoc 

Tr(/»c  :  s  t  ack-0)  =  impl,  where  Instruction/,!/*:'  1)  =  “new  class ”,  and  class’s 
implementation  of  M  has  identifier  impl 

In  a  sense,  this  query  propagates  “liveness”  from  call  sites  to  method  implementations, 
whereas  the  method  call  resolver  propagates  method  implementations  to  call  sites. 
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This  is  an  example  of  a  tool  which  associates  the  same  target  datum  with  more  than  one 
target  expression.  A  method  implementation  is  live  if  Mis  invoked  on  any  object  which 
inherits  that  method  implementation. 

The  analysis  specified  here  does  not  detect  all  live  methods.  Calls  to  static  methods  must 
be  detected  separately.  In  Java,  there  is  also  an  invokespecial  instruction  which  calls 
non-static  methods  using  static  dispatch. 

4.4  Additional  Features  of  the  Ajax  Implementation 

4.4.1  Query  Families  and  Query  Fields 

The  examples  in  Sections  4.3.4  and  4.3.5  show  how  to  perform  method  call  resolution  or 
live  code  detection  for  a  specific  method  signature  M.  To  perform  these  tasks  for  all  method 
signatures,  it  suffices  to  perform  a  separate  query  for  each  signature  encountered  in  the 
program.  Other  tools  also  need  to  make  many  queries  varying  only  their  S,  T,  DI?  and  TR 
parameters. 

For  greater  efficiency  and  convenience,  Ajax  allows  the  remaining  parameters  —  R,  D, 
Dm,  and  De  —  to  be  treated  as  a  unit,  a  query  family.  Each  query  family  defines  an  index 
type,  I,  so  that  queries  belonging  to  each  query  family  are  indexed  by  elements  of  I.  In  the 
examples  above,  the  elements  of  I  are  the  method  signatures  M.  Ajax  is  designed  to  allow 
a  query  family  to  easily  manipulate  its  collection  of  queries  through  the  index  elements. 
Each  instance  of  an  analysis  can  efficiently  support  many  different  query  families  and 
many  queries  within  each  family. 

4.4.2  Incrementality 

Ajax  is  highly  incremental.  New  code  can  be  added  to  the  analyzed  program  at  any  time, 
in  response  to  program  modifications  or  environmental  changes.  The  results  of  the  analyses 
and  tools  are  updated  to  reflect  the  dynamic  changes.  This  requires  two  elaborations  of  the 
VPR  interface  presented  in  this  chapter. 

The  query  parameters  S,  T,  Dj,  and  TR  cannot  be  explicitly  stated  a  priori,  because  the  sum 
of  “all  the  code  that  might  ever  be  live”  is  ill-defined  or  impractically  large  (for  example, 
it  includes  the  entire  Java  class  library,  which  is  very  large).  Therefore  whenever  a  new 
method  is  added  to  the  “live  program,”  the  Ajax  system  calls  back  into  the  tool,  notifying 
it  of  the  existence  of  the  new  method.  The  tool  responds  by  extending  its  S,  T,  DE  and  TR 
parameters  with  the  expressions  whose  locations  are  in  the  new  method.  The  analyses  must 
be  capable  of  handling  such  dynamic  updates  to  the  parameters.  For  the  Ajax  analyses,  this 
was  tricky  to  implement  but  not  conceptually  difficult. 

Expressions  in  dead  methods  are  not  related  to  any  other  expressions,  even  themselves. 
Therefore,  if  a  tool  is  never  notified  of  the  existence  of  a  method,  the  results  for  target 
expressions  in  that  method  are  trivially  equal  to  DE.  In  practice,  tools  have  special  handling 
for  unreachable  source  or  target  expressions.  In  the  “find  writers  to  a  field”  example,  if  the 
source  expression  specifying  the  field  is  in  unreachable  code,  it  is  preferable  to  report  that 
fact  to  the  user  rather  than  to  report  that  there  are  no  writers  to  the  field. 
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Since  the  results  of  an  analysis  can  change  when  the  analyzed  program  changes,  results  are 
reported  to  a  tool  using  a  callback.  When  the  analysis  computes  a  new  result  for  a  tool  target 
datum,  it  reports  the  datum  and  result  pair  to  the  tool  through  the  callback.  In  fact,  the 
analyses  report  results  even  before  the  analysis  is  complete;  this  results  can  be  superceded 
by  subsequent  callbacks.  Ajax  makes  no  guarantees  of  any  relationship  between  these 
“progressive  results”  and  the  final  result  for  a  target  datum.  However,  the  progressive 
results  can  be  used  for  advisory  purposes,  such  as  displaying  progress  to  a  user.  When  an 
analysis  completes,  it  signals  the  tool  that  the  last  reported  results  for  each  tool  target  datum 
are  sound. 

4.4.3  Code  Mutation 

Ajax  supports  changes  being  made  to  the  program  during  analysis,  and  even  after  analysis 
has  completed.  If  analysis  has  already  completed,  then  the  results  are  updated  progressively 
until  completion  is  signalled  again.  Many  tools  are  not  persistently  attached  to  the  program 
being  analyzed,  and  terminate  after  the  first  complete  results  have  been  delivered. 

The  implementation  of  code  mutation  is  quite  simple:  for  each  changed,  live  method, 
another  “live  method”  notification  is  sent  to  the  analysis.  It  is  up  to  each  analysis  to  decide 
how  to  handle  multiple  live  method  notifications  for  a  single  method.  The  analyses  imple¬ 
mented  in  Ajax  generate  new  constraints  for  the  new  code  and  add  them  to  the  existing  set 
of  constraints  (i.e.,  old  constraints  are  not  revoked).  This  is  simple  and  does  not  penalize 
the  common  case  in  which  code  is  not  mutated. 

4.4.4  Analysis  Scoping 

No  analysis  for  Java  can  attempt  to  analyze  all  available  code,  because  the  standard 
libraries  are  so  large  that  performance  would  be  unacceptable.  The  code  to  be  analyzed 
must  be  identified  as  part  of  the  analysis.  A  natural  approach  is  to  compute  a  fixed  point 
from  below:  start  by  assuming  that  just  one  “main”  method  is  live,  analyze  it,  discover  other 
methods  that  may  be  called,  add  those  to  the  set  of  live  methods,  analyze  those  new 
methods,  and  so  on. 

Ajax’s  incremental  analysis  makes  this  simple.  A  live  code  detection  tool  is  instantiated, 
just  as  described  in  Section  4.3.5.  It  maintains  a  set  of  methods  currently  thought  to  be  live; 
This  set  is  initialized  to  a  “main”  method  by  the  tool  environment.  The  analysis  then  runs 
and  reports  results  to  the  live  code  detection  tool,  which  adds  new  live  methods  to  the  live 
method  set.  The  analysis  is  notified  of  these  new  live  methods,  computes  new  results, 
reports  them  to  the  tools,  and  the  cycle  continues.  This  means  that  typically  an  Ajax  system 
is  configured  with  two  tools:  a  live  method  detection  tool  to  control  the  scope  of  the 
analysis,  and  the  tool  that  the  user  is  actually  interested  in. 

This  approximation  of  the  set  of  live  methods  from  below  is  frequently  seen  in  prior  work, 
for  example  RTA  [9].  Ajax  extends  this  work  by  factoring  out  the  approximation  and 
applying  it  to  any  analysis. 

4.4.5  Intersection 

A  natural  extension  of  the  framework  presented  above  is  to  extend  the  operations  on  the 
intermediate  data  D  to  make  it  a  true  lattice;  i.e.,  to  provide  a  meet  operator  DN  corre- 
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sponding  to  set  intersection.  This  requires  an  additional  lattice-like  property  of  the  tool’s  F 
function: 

F[  PnO]  =  Dn(F[P],F[0]) 

This  is  useful  for  analyses  that  compute  two  or  more  different,  but  individually  sound, 
approximations  to  the  value-point  relation.  The  intersection  of  two  sound  approximations 
to  the  true  relation  is  also  a  sound  approximation  to  the  true  relation.  In  other  words,  given 

relations  <r^iP  and  o-2/>,  the  relation  <H>P  defined  as  s  <r^Pt  =  s  P  t  /  s  -pp-ip 1  's  a  sound 

approximation  to  the  truth,  and  potentially  more  accurate  than  either  of  the  input  relations. 

Now  consider  implementing  the  Ajax  interface  with  such  an  analysis,  and  computing  the  F 
values  for  a  tool: 

{(t,  F[{A  e  S  |  ^  <-»p  i}])  1 1  e  T} 

=  {(t,  F[{A  G  S  I  5  <-»!p  t  A  5  <r^2P  0])  U  G  T} 

=  {(t,  F[{s  e  S  |  ^  <-»jp  t}  n  e  S  |  ^  <r^2p  0])  U  G  T} 

=  {(^,  Dn(F[{5  gS\s  <-»!/>  t. }],  F[{s  g  S  I  5  **2  p  0]))  \t  G  T} 

Therefore,  it  suffices  to  compute  the  F  values  for  the  two  relations  separately  and  then  apply 
the  meet  operator. 

It  is  straightforward  to  implement  a  functor  that  takes  a  set  of  Ajax  analyses  and  combines 
them  in  this  way.  Of  course,  tools  must  provide  a  suitable  meet  operator.  The  examples 
above  which  use  boolean  values  as  their  intermediate  data  can  use  the  boolean  “and” 
operator  as  the  meet. 

The  example  using  the  Java  class  lattice  explicitly  represents  the  meet  of  two  classes  as  an 
“intersection  class”  of  the  two  classes.  The  representation  of  intersection  classes  can  often 
be  simplified  by  exploiting  facts  about  the  Java  class  hierarchy.  For  example,  an  inter¬ 
section  class  containing  two  non-interface  classes  is  empty  unless  one  of  the  classes  is  a 
(possibly  indirect)  superclass  of  the  other,  because  multiple  inheritance  is  only  allowed  for 
interfaces. 

Of  the  examples  in  this  chapter,  the  method  call  resolution  tool  presents  the  most  diffi¬ 
culties  in  defining  a  suitable  meet  operator.  The  problem  is  that  when  both  of  the  operands 
of  the  meet  are  “many”,  the  precise  result  cannot  be  determined.  The  operator  must  return 
“many”.  This  is  a  safe  approximation,  but  the  analysis  parameters  that  we  introduced  for 
efficiency  are  now  causing  us  to  lose  information.  For  example,  the  sets  {  Mj,  M2  }  and 
{  M2,  M3  }  both  map  to  the  abstract  value  “many”;  their  intersection  could  be  represented 
with  the  abstract  singleton  {  M2  },  but  this  cannot  be  computed  from  the  abstract  values 
alone.  In  this  situation,  the  results  returned  to  the  tool  may  vary  from  run  to  run  depending 
on  the  order  of  analysis  computations,  even  if  the  underlying  analyses  compute  the  same 
VPR  approximations  in  each  run. 
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5  Implementing  the  Value- 
Point  Relation  With  RTA 


5.1  Introduction 

5.1.1  Introduction  to  Rapid  Type  Analysis 

Bacon  and  Sweeney  proposed  Rapid  Type  Analysis  [9]  as  a  fast  algorithm  for  resolving 
dynamic  method  calls  in  statically  typed  object  oriented  programs;  it  was  originally  applied 
to  C++  programs.  RTA  uses  static  type  information  to  resolve  dynamic  method  calls  as 
follows:  given  a  virtual  call  to  method  m  of  object  reference  v,  find  Cv,  the  static  class  of  v, 
and  compute  the  set  S  of  all  subclasses  of  Cv,  including  Cv  itself.  Soundness  of  the  static 
type  implies  that  these  classes  are  a  superset  of  the  possible  classes  that  v  can  have  at  run¬ 
time.  Therefore  if  every  class  in  S  implementing  m  uses  the  same  implementation  of  m,  the 
call  can  be  statically  resolved  to  that  implementation. 

As  described,  this  is  also  known  as  Class  Hierarchy  Analysis  [32].  However,  RTA  adds  an 
important  extension  to  improve  accuracy  without  harming  efficiency.  Consider  the  Java 
program  in  Figure  5-1. 


abstract  class  Super  { 

abstract  void  m(); 

static 

} 

class  Subl 

int  n; 

extends 

Super 

{ 

void  mi 

} 

class  Sub2 

I)  {  n  = 

i;  } 

extends 

Super 

{ 

void  mi 

} 

class  Main 

I)  {  n  = 

2;  } 

{ 

void  f i 

[ )  {  new 

Sub2  (  ) 

;  } 

void  main ( String [ ]  args)  {  Super  v  =  new  Subl();  v.m();  } 

} 

Figure  5-1.  A  simple  Java  program 


CHA  determines  that  v  has  two  possible  implementations  of  m,  one  from  Subl  and  one 
from  Sub2 ,  and  therefore  the  call  v .  m  ( )  cannot  be  resolved.  However,  RTA  observes  that 
the  method  f  ( )  is  never  called  and  no  object  of  class  Sub2  is  ever  created,  and  therefore 
v’s  only  possible  implemention  of  m  is  from  Subl;  the  call  is  resolved. 

In  this  example,  RTA  starts  by  assuming  that  Main .  main  is  the  only  live  method  and  that 
no  classes  are  instantiated.  It  examines  the  body  of  Main .  main  and  discovers  that  Subl 
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is  instantiated  and  there  is  a  dynamic  method  call  to  Super  .  m.  At  this  point  Subl  is  the 
only  class  in  the  set  of  instantiated  classes,  so  the  only  possible  implementor  of  Super  .  m 
is  Subl .  m,  which  is  added  to  the  live  method  set.  Then  Subl .  m  is  examined,  which  does 
not  add  any  new  methods  or  instantiated  classes.  Now  that  all  the  live  methods  have  been 
examined,  the  algorithm  terminates. 

The  efficacy  of  CHA  is  based  on  the  observation  that  in  most  object  oriented  programs, 
many  overridable  methods  in  fact  have  only  one  implementation.  These  include  methods 
in  an  abstract  interface  that  has  only  one  implementation,  and  methods  in  a  class  that  has 
no  subclasses.  RTA  extends  CHA  to  exploit  the  fact  that  even  when  there  is  more  than  one 
implementation  available,  many  programs  will  only  use  one  implementation. 

Both  the  RTA  and  CHA  algorithms  were  originally  tailored  to  the  problem  of  resolving 
dynamic  method  calls.  In  Ajax,  the  technique  underlying  RTA  is  generalized  away  from 
any  particular  problem  and  used  to  generate  VPR  information  in  response  to  arbitrary 
queries.  For  example,  the  Ajax  implementation  of  RTA  can  be  used  to  produce  information 
similar  to  that  produced  by  the  “type  based  alias  analysis”  of  Diwan  et  al.  [23]. 

By  decoupling  the  analysis  from  its  applications,  Ajax  makes  differences  between  analyses 
more  apparent.  For  example,  it  becomes  clear  that  Diwan  et  al.’s  basic  “type  based  alias 
analysis”  is  actually  slightly  less  precise  than  RTA,  because  it  lacks  an  analogue  of  “exact 
class  types”  (see  Section  5.2.4).  The  differences  were  previously  obscured  because  both  the 
analyses  and  their  applications  varied  in  tandem. 

5.1.2  Decomposing  RTA  in  Ajax 

In  Ajax,  RTA  is  restructured  into  four  distinct  activities: 

1 .  Computation  of  the  set  of  live  methods 

2.  Computation  of  the  set  of  instantiated  classes 

3.  Construction  of  an  approximation  to  the  value-point  relation  using  static  type  informa¬ 
tion  and  the  set  of  instantiated  classes 

4.  Application  of  the  value-point  relation  to  determine  the  callees  of  dynamic  method 
calls 

Section  4.4.4  explains  how  for  all  analyses,  Ajax  computes  a  live  method  set  using  a 
bottom-up  fixpoint  procedure,  just  as  RTA  does.  This  subsumes  the  first  and  fourth  activ¬ 
ities  above. 

Computing  the  set  of  instantiated  classes  from  the  set  of  live  methods  is  trivial.  We  simply 
scan  the  method  bodies  for  occurrences  of  the  new  instruction  and  note  the  class  parameter 
of  each  such  instruction. 

The  subject  of  this  chapter  is  the  third  activity:  using  static  type  information  and  knowledge 
of  the  set  of  instantiated  classes  to  implement  the  Ajax  analysis  interface. 

Section  5.2  describes  how  this  information  is  used  to  approximate  the  value-point  relation. 
Section  5.3  shows  how  to  structure  the  computation  to  support  the  efficient  analysis  param¬ 
eters  described  in  Section  4.2.  The  chapter  concludes  with  discussion  of  some  extensions. 
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5.2  Approximating  the  Value-Point  Relation 


5.2.1  Overview 

Abstractly,  the  task  of  any  Ajax  analysis  is  to  determine  whether  a  given  pair  of  bytecode 
expressions  ( B1 ,  B2)  is  in  the  value-point  relation.  The  decision  must  be  conservative;  if 
there  is  any  uncertainty,  the  analysis  must  assume  that  the  pair  is  in  the  relation.  The  RTA 
analysis  receives  as  input  a  set  L  of  the  methods  in  the  program  that  it  must  assume  to  be 
live.  It  also  has  access  to  the  program,  so  it  can  compute  the  class  hierarchy. 

The  basic  idea  is  to  find  static  types  for  B1  and  B2,  and  then  compare  the  types  to  decide 
whether  it  is  possible  for  a  value  to  conform  to  both  of  them  simultaneously.  These  two 
steps  are  elucidated  in  the  next  two  subsections. 

In  this  section  I  discuss  the  analysis  in  the  context  of  full  Java  bytecode  rather  than  the 
MJBC  subset  language,  because  MJBC  does  not  define  a  static  type  discipline  analogous 
to  the  Java  Virtual  Machine’s  “verification”  procedure  and  the  Java  type  system.  RTA 
depends  on  the  existence  and  soundness  of  such  a  type  system. 

5.2.2  Types  for  Bytecode  Expressions 

Each  bytecode  expression  Bi  is  a  pair  (/,,  <?,)  consisting  of  a  program  location  /,  and  an 
expression  el  to  be  evaluated  at  that  location.  In  principle,  it  is  not  difficult  for  Ajax  RTA 
to  compute  static  types  for  the  expressions,  because  the  Java  Virtual  Machine  computes 
them  while  type  checking  Java  bytecode  [48]. 

A  full  explanation  of  Java  bytecode  type  reconstruction  and  verification  is  beyond  the 
scope  of  this  thesis.  Such  an  explanation  can  be  found  in  references  such  as  the  Java  Virtual 
Machine  Specification  [48].  Simply  put,  the  type  reconstruction  algorithm  performs  intra¬ 
procedural  dataflow  analysis,  propagating  facts  about  the  types  of  values  along  data  flow 
paths.  The  sources  of  type  information  are  type  annotations  on  the  bytecode  instructions. 

Ajax  RTA  has  some  requirements  that  are  not  met  by  the  standard  bytecode  verification 
algorithm. 

•  Ajax  RTA  differs  from  the  standard  JVM  verifier  in  the  way  it  merges  object  types  at 
control  flow  merge  points.  In  order  to  obtain  slightly  better  accuracy  for  RTA,  instead 
of  moving  up  the  class  hierarchy  to  the  most  specific  common  superclass  of  the  classes 
being  merged,  Ajax  creates  a  union  type  of  the  two  types.  For  example,  suppose  Subl 
and  Sub2  are  both  subclasses  of  class  Super.  If  a  stack  element  has  object  type  Subl 
along  one  path  and  type  Sub 2  along  another  path,  the  standard  Java  verifier  will  give 
the  element  type  Super  at  the  point  where  the  paths  merge.  Ajax  will  give  the  element 
the  set  of  types  {  Subl,  Sub 2  },  interpreted  as  the  union  of  those  two  types.  If  Super 
has  additional  subclasses,  then  this  union  type  is  more  precise  than  the  type  Super. 

•  The  use  of  polymorphic  bytecode  subroutines  can  require  an  assignment  of  more  than 
one  possible  type  to  a  value-point.  In  particular,  if  the  location  is  within  a  subroutine 
and  the  expression  refers  to  a  local  variable  that  the  subroutine  does  not  touch,  the  sub¬ 
routine  may  be  called  from  multiple  contexts  that  give  different  types  to  that  variable. 
Ajax  RTA  uses  dataflow  analysis  to  compute  union  types  for  this  case. 


83 


•  Expressions  may  denote  local  variables  or  stack  elements  in  contexts  where  they  have 
not  yet  been  initialized.  In  this  case  the  “union  set”  of  types  is  set  to  be  empty,  which 
eventually  causes  the  analysis  to  report  that  such  expressions  are  not  related  to  any 
expression. 

•  For  an  expression  denoting  the  field  of  an  object,  Ajax  RTA  simply  uses  the  declared 
type  of  the  field.  (Field  names  in  a  bytecode  expression  are  always  fully  qualified  with 
the  name  of  the  class  declaring  the  field,  and  are  therefore  unambiguous.)  Therefore 
Ajax  computes  a  valid  type  even  if  the  expression  refers  to  a  field  of  an  uninitialized 
variable.  This  behavior  is  sound,  although  it  may  lead  to  unnecessary  pairs  in  the  VPR 
approximation.  In  practice  accuracy  does  not  suffer,  because  tools  do  not  use  such 
expressions.  (Java  bytecode  verification  usually  ensures  that  code  cannot  use  unitial¬ 
ized  variables,  and  tools  usually  refer  to  variables  at  instructions  where  they  are  used  or 
defined.) 

•  Where  the  constant  null  occurs  in  the  bytecode,  we  assign  it  the  empty  type  set,  because 
null  values  do  not  induce  relationships  in  the  VPR. 

5.2.3  Computing  the  Relation 

Suppose  two  expressions  B1  and  B2  have  union  sets  of  Java  bytecode  types  .Sj  and  S2 
respectively.  If  they  are  related  in  the  VPR,  then  at  mn-time  there  is  a  non-null  value  v 
appearing  at  both  expressions.  Thus,  v  must  conform  to  at  least  one  static  type  from  ,Sj  and 
at  least  one  static  type  from  S2.  Ajax  checks  all  pairs  of  types  (sq,  s2)  in  ,Sj  0  S2  to  see  if 
there  could  be  such  a  v  conforming  to  both  types  .sq  and  s2.  If  such  a  pair  does  not  exist,  then 
there  can  be  no  relationship  between  the  expressions;  otherwise  RTA  assumes  they  are 
related  and  includes  the  pair  in  its  VPR  approximation.  This  strategy  is  efficient  in  practice 
because  each  set  usually  contains  only  one  element;  the  special  cases  of  polymorphic 
subroutines  and  merging  different  object  types  are  rare.  If  one  of  the  sets  is  empty,  the 
algorithm  yields  the  correct  result:  the  expressions  are  not  related. 

Now  the  problem  has  reduced  to  the  following:  given  two  Java  bytecode  types  5!  and  s2, 
can  there  be  a  non-null  run-time  value  conforming  to  both  ,v |  and  s2l 

To  determine  the  answer,  Ajax  constructs  a  directed  acyclic  graph  representing  the 
hierarchy  of  Java  bytecode  types.  Figure  5-2  is  an  example.  There  is  a  root,  TOP,  the 
supertype  of  all  other  types.  The  primitive  types  int,  long,  float,  and  double  are  all 
distinct.  There  is  a  special  type  for  bytecode  return  addresses,  which  arise  when  the  Java 
try/finally  construct  is  compiled  into  bytecode  j  sr  and  ret  instructions.  The  Java 
class  hierarchy  is  inserted  into  the  type  graph,  rooted  at  class  Ob  j  ect.  Interfaces  such  as 
Serializable  are  also  treated  as  types,  which  means  that  classes  can  have  multiple 
direct  supertypes,  as  shown  by  String  and  Component  in  the  example.  Each  type  repre¬ 
senting  a  class  (but  not  an  interface)  is  labelled  to  indicate  whether  or  not  any  objects  with 
that  dynamic  class  can  actually  be  created  by  the  program.  In  the  example,  the  instantiated 
types  are  shown  in  bold.  Primitive  types  and  return  addresses  are  always  considered  to  be 
instantiated. 

If  a  run-time  value  conforms  to  static  types  s  |  and  s2,  then  its  “run-time  type”  must  be  an 
instantiated  type.  Therefore  the  intersection  of  the  subgraphs  rooted  at  sq  and  s2  must 
contain  at  least  one  instantiated  type.  In  other  words,  if  there  is  no  instantiated  type 
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reachable  from  both  s  j  and  .y2,  then  no  non-null  run-time  value  can  conform  to  both  .v  |  and 

52- 

Figure  5-2  shows  that  no  non-null  value  conforms  to  both  I  temSelectable  and 
Serializable,  nor  Object  and  Return  Address.  On  the  other  hand,  there  may  be 
a  non-null  value  conforming  to  both  Serializable  and  Component;  it  must  be  a 
Label. 

The  smaller  primitive  types  boolean,  byte,  short  and  char,  do  not  occur  in  the  graph 
because  the  Java  Virtual  Machine  treats  them  as  ints  internally;  the  precise  type  is  signif¬ 
icant  only  when  the  value  is  loaded  or  stored  in  an  object  field  or  array.  Therefore  Ajax 
RTA  treats  these  types  as  identical  to  int. 

Array  types  require  special  treatment.  Every  array  type  (e.g.  String  [  ] )  has  an  associated 
class  in  the  Java  bytecode,  but  the  array  classes  do  not  capture  the  full  subtyping  properties 
of  arrays.  Every  array  class  is  a  subclass  of  Ob  j  ect,  Clone  able,  and  Serializable, 
so  every  array  type  is  a  subtype  of  these  types.  However,  every  array  of  type  T  [  ]  is  also  a 
subtype  of  S  [  ]  when  T  is  a  subtype  of  S .  (This  subtyping  relationship  is  not  semantically 
reasonable  —  in  fact  it  is  unsound  without  dedicated  run-time  checks  —  but  the  Java 
Virtual  Machine  does  allow  a  variable  with  static  type  S  [  ]  to  refer  to  an  object  of  type 
T  [  ] .)  These  covariant  subtyping  relationships  are  not  reflected  in  the  JBC  class  hierarchy. 
Ajax  RTA  adds  these  relationships  to  the  graph  separately. 

The  TOP  type  is  included  because  some  situations  arise  where  the  type  of  an  expression  is 
not  known.  This  can  happen  when  expressions  refer  to  native  code  specifications  —  see 
Section  8.3.5. 

5.2.4  Exact  Class  Types 

In  general,  when  a  variable  with  a  class  type  C  occurs  in  a  Java  bytecode  program,  we 
conclude  that  its  value  is  an  object  of  class  C  or  any  subclass  of  C.  However,  when  the 
variable  is  the  direct  result  of  a  new  operation,  we  know  that  it  is  precisely  the  class 
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specified  in  the  new  instruction.  In  this  case,  we  give  the  variable  an  exact  class  type  “C- 
Only”.  The  only  values  conforming  to  this  static  type  are  objects  of  class  C  and  no  other. 

This  extension  is  necessary  in  order  for  Ajax  RTA  to  be  as  accurate  as  traditional  RTA.  To 
see  this,  suppose  Ajax  RTA  is  used  with  the  type  graph  of  Figure  5-2  to  resolve  the  dynamic 
method  call  s  .  hashCode  ( )  in  the  program  fragment  in  Figure  5-3. 


void  f (String  s.  Object  o)  { 
s . hashCode ( )  ; 
o . hashCode ( )  ; 

} 

...  x  =  new  Object ();  y  =  new  String ();  z  =  new  Label (); 


Figure  5-3.  A  fragment  illustrating  the  need  for  exact  class  types 

The  query  tries  to  resolve  the  method  call  by  collecting  all  classes  C  such  that  the  result  of 
a  “new  C”  instruction  is  related  to  the  variable  s.  Those  classes  are  the  possible  receivers 
of  the  method  call. 

Without  exact  class  types,  the  static  type  of  s  is  String,  and  the  static  types  of  x  and  y 
are  Ob j  ect  and  String  respectively.  Because  Ob j  ect  and  String  can  have  a  non¬ 
null  value  in  common  (namely,  any  String),  Ajax  RTA  would  conclude  that  s  is  related 
to  both  sites,  and  therefore  both  Ob  j  ect  and  String  can  receive  the  method  call. 
Because  they  have  different  implementations  of  hashCode,  the  call  to  s  .  hashCode  ( ) 
would  not  be  resolved. 

With  exact  class  types,  the  static  type  of  s  is  still  String,  but  the  static  types  of  new 
Ob  j  ect  and  new  String  are  the  exact  class  types  “Ob  j  ect-Only”  and  “String- 
Only”.  Ob  j  ect-Only  does  not  have  any  non-null  values  in  common  with  String. 
Therefore,  the  only  new  site  matching  s  is  new  String,  and  the  call  is  resolved  as 
expected. 

The  changes  to  the  type  graph  are  simple:  Every  inexact  class  type  C  that  is  instantiated 
gains  a  new  subtype,  “C-Only”.  C-Only  has  no  subtypes  and  its  sole  supertype  is  C.  The 
instantiation  annotations  are  changed  to  indicate  that  exact  class  types  are  instantiated 
directly  but  inexact  class  types  are  not.  The  graph  in  Figure  5-2  is  transformed  into  the 
graph  of  Figure  5-4. 

5.3  Implementing  the  Ajax  Analysis  Interface 

The  previous  section  specifies  the  approximation  to  the  value-point  relation  computed  by 
Ajax  RTA.  This  section  describes  an  efficient  implementation  of  the  Ajax  analysis 
interface  using  this  approximation. 

Recall  that  the  interface  specifies  the  following  parameters  to  the  analysis: 

•  A  type  D  of  intermediate  data  to  be  propagated 

•  A  type  R  of  tool  target  data 
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•  An  associative,  commutative,  idempotent  binary  “merge”  operator  DM  :DxD^D 
with  identity  element  DE 

•  A  set  S  of  source  expressions  from  which  data  will  be  propagated 

•  A  set  T  of  target  value-points  to  which  data  will  be  propagated 

•  An  initial  assignment  of  intermediate  data  to  source  expressions  Dj :  S  — »  D 

•  A  map  from  target  expressions  to  tool  target  data  Tr:T^R 
The  analysis  computes: 

{( d ,  F[{v  e  S  |  3t  e  T.s  <r^P  t  a  TR(7)  =  <7}])  |  d  e  range  TR} 
where 

F[{  }]  =  De 

F[  P^O]  =  Dm(F[P],F[0]) 

F  [{«}]  =  Dj(v) 

This  is  computed  efficiently  using  an  extension  of  the  subtype  graph. 

5.3.1  The  Data  Propagation  Graph 

Suppose  that  the  original  type  graph  given  above  consists  of  types  Y  with  a  subtype  relation 
Ysub- (If  Jt  has  a  subtype  y2  then  (y,,  y2)  e  Ysub.)  LetYjbe  the  subset  of  the  Y  which  are 
actually  instantiated.  Ajax  RTA  constructs  a  new  propagation  graph  with  nodes 

PN  =  {In-f  1 1  g  Y}  vj  {Out-t  1 1  g  Y} 

and  edges 
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PE  =  {(In -ylt  In -y2)  \  (yv  y2)  e  Ysub} 

^  {(Out-y2,  Out-y, )  |  (yv  y2)  e  Ysub}  u  {(In-y,  Out-y)  |  y  e  YT} 

Informally,  we  make  a  copy  of  the  subtype  graph,  flip  the  copy  upside  down,  and  then  paste 
it  below  the  original  graph  with  edges  connecting  original  nodes  to  their  copies,  but  only 
for  the  nodes  corresponding  to  types  that  are  actually  instantiated.  The  graph  in  Figure  5-4 
is  transformed  into  the  graph  shown  in  Figure  5-5. 
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Lemma:  Let  be  the  relation  between  expressions  and  their  RTA  types,  as  explained  in 
Section  5.2.2.  RTA  relates  s  <-»  t  if  and  only  if  there  is  a  path  from  In-/)  to  Out-j)  where 
s  :  js  and  t :  jt. 

Proof:  The  RTA  approximation  to  the  value-point  relation  defines  iotto  mean  that  there 
is  an  instantiated  type  w  and  types  js,  jt  such  that  vv  is  a  subtype  of  js  and  jt,  s  :  js  and  t :  jt. 
This  implies  that  in  the  original  type  graph  there  is  a  path  from  js  to  w  and  from  jt  to  w.  Thus 
in  the  propagation  graph  there  is  a  path  from  In -js  to  In-vv  and  from  Out-vv  to  Out-y).  There 
is  an  edge  from  In-w  to  Out-vv  because  w  is  instantiated.  Thus  there  is  a  path  from  In -js  to 
Out -jt. 

Now  suppose  there  is  a  path  from  I n-y  v  to  Out -jt  where  5  :  js  and  1  :  jt.  There  must  exist  an 
edge  in  the  path  connecting  In-w  to  Out-w'  for  some  w  and  w' .  All  such  edges  are  of  the 
form  (In-y,  Out-y)  where  y  is  an  instantiated  type,  therefore  w  =  w'  and  w  is  an  instantiated 
type.  Furthermore  there  is  a  path  from  In -js  to  In-vv;  this  path  passes  only  through  In  nodes 
(because  there  are  no  edges  from  any  Out  node  back  to  an  In  node).  This  implies  that  there 
is  a  path  from  js  to  w  in  the  original  graph,  which  means  w  is  a  subtype  of  /).  Likewise,  the 
path  from  Out-vv  to  Out-/)  implies  there  is  a  path  from jt  to  w  in  the  original  graph,  meaning 
w  is  also  a  subtype  of  jt.  Combining  all  these  facts  about  w  shows  that  RTA  will  conclude 
s  <-»  t  . 

5.3.2  Computing  Analysis  Results 

Now  Ajax  computes  an  assignment  A  of  intermediate  data  D  to  the  nodes  of  the  propa¬ 
gation  graph,  satisfying  the  following  for  all  nodes  y: 

My)  =  f{Dj(s)|  s  g  S  a  PathFrom(ln-js,  y)  /  .s  :/v } 

The  idea  is  to  start  by  assigning  the  initial  data  to  each  associated  node,  and  then  propagate 
the  data  along  the  graph  edges,  merging  the  incoming  data  at  each  node.  An  example  is 
given  below. 

Ajax  computes  A  iteratively  as  follows: 

A0C?)  =  F{Dj(s)  I  5  g  s  a In-js  =  y  a  s:js) 

An  +  l(y)  =  F({A n(p)  |  (p,  y)  e  PE}  u  (A„(y)}) 

Initially  A  is  set  to  the  initial  data  associated  with  the  In  nodes.  At  each  iteration,  the  value 
at  each  node  is  updated  from  the  values  at  all  the  node’s  predecessors.  The  loop  terminates 
when  B  +  lOO  =  A„(y). 

The  result  of  the  analysis  is  then: 

{(cf,  F[{A (jt)  |  3t  g  T.TR(i)  =  d  a  f.jt}])  |  d  g  range  TR} 

For  each  tool  target  datum  d ,  this  last  pass  collects  and  merges  the  values  from  each  graph 
node  associated  with  a  target  expression  associated  with  d. 

The  correctness  of  this  result  follows  immediately  from  the  lemma  in  Section  5.3.1. 


89 


5.3.3  Example 

Consider  the  problem  of  determining  the  callees  of  the  dynamic  method  calls  in  the 
program  fragment  in  Figure  5-3,  using  the  graph  in  Figure  5-5.  The  query  is  set  up  as 
follows: 

An  intermediate  datum  is  a  set  of  implementations  of  hashcode.  The  class  Label 
inherits  its  hashcode  method  from  Ob  j  ect,  and  therefore  there  are  only  two  distinct 
implementations  of  hashcode:  Ob  j  ect .  hashCode  and  String .  hashcode. 

D  =  P( {  Object. hashCode,  String . hashCode  }) 

Dm  =  ^ 

DE  =  0 

S  =  {  x  at  statement  x  =  . . . ,  y  at  statement  y  =  . . . ,  z  at  statement  z  =  . . .  } 

T  =  {  s  at  statement  s  .  hashCode  ( ) ,  o  at  statement  o  .  hashCode  ( )  } 

R  =  {  statement  s  .  hashCode  ( ) ,  statement  o  .  hashCode  ( )  } 

Tr  maps  each  expression  to  the  statement  it  occurs  in 

The  initial  datum  assignment  maps  the  result  of  each  new  instruction  to  the  implementation 
of  hashcode  used  by  the  created  object: 

Dj  =  [x  — »  {  Ob  j  ect .  hashCode  } ,  y  — >  {  String .  hashCode  }, 
z  — »  {  Ob  j  ect .  hashCode  }] 

The  initial  A  is 

A0  =  [In-Object-Only  — »  {  Ob  j  ect .  hashCode  }, 

In-String-Only  — »  {  String . hashCode  }, 

In-Label-Only  — »  {  Ob  j  ect .  hashCode  }] 

All  types  not  explicitly  mapped  are  mapped  to  the  empty  set. 

These  values  are  propagated  down  the  graph,  using  set  union  to  merge  them  at  nodes  with 
multiple  incoming  edges.  The  final  value  of  A  is: 

A  =  [In-Object-Only  — »  {  Ob  j  ect .  hashCode  }, 

In-String-Only  — »  {  String . hashCode  }, 

In-Label-Only  — »  {  Ob  j  ect .  hashCode  }, 

Out-Object-Only  — »  {  Ob  j  ect .  hashCode  }, 

Out-String-Only  — »  {  String . hashCode  }, 

Out-Label-Only  — »  {  Ob  j  ect .  hashCode  }, 

Out-Labels  {  Ob  j  ect .  hashCode  }, 

Out-Component  — »  {  Ob  j  ect .  hashCode  }, 

Out-String  — »  {  String . hashCode  }, 

Out-Serializable  — »  {  String . hashCode  }, 

Out-Object  — »  {  Ob  j  ect .  hashCode,  String .  hashCode  }, 

Out-TOPS  {  Ob  j  ect .  hashCode,  String .  hashCode  }] 
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Thus  Ajax  RTA  determines  that  the  call  to  s  .  hashCode  has  possible  receivers 
A(Out-String)  =  {  String .  hashCode  },  and  the  call  to  o  .  hashCode  has  possible 
receivers  A(Out-Object)  =  {  Ob j  ect .  hashCode,  String .  hashCode  }.  That  is,  the 
statement  s  .  hashCode  ( )  will  always  call  the  implementation  in  the  String  class  (and 
could  be  replaced  by  a  static  method  call),  but  the  statement  o  .  hashCode  ( )  may  call  the 
implementation  in  the  String  class  or  the  implementation  in  the  Ob  j  ect  class. 

5.3.4  Performance 

Ajax  RTA  implements  the  above  algorithm  using  a  worklist.  The  number  of  steps  required 
is  simply  the  number  of  times  an  element  of  A  is  changed.  Typically  a  tool  chooses  its  DM 
operator  so  that  the  data  at  a  node  can  only  change  a  small  number  of  times  before  reaching 
a  fixed  point.  If  DM  is  thought  of  as  a  lattice  join  operator,  then  the  tool  should  choose  a 
lattice  with  a  small  height.  If  the  height  is  indeed  bounded  by  a  small  constant,  then  the  time 
to  compute  A’s  fixed  point  is  proportional  to  the  size  of  the  propagation  graph,  which  is 
roughly  proportional  to  the  size  of  the  program.  If  the  sizes  of  the  S  and  T  sets  are  also 
proportional  to  the  size  of  the  program,  the  whole  algorithm  runs  in  linear  time. 

Quantitative  performance  measurements  of  this  implementation  of  RTA  are  presented  in 
Section  9.4. 

5.3.5  Incrementality 

The  algorithm  described  here  is  quite  simple.  However,  the  implementation  is  nontrivial 
because  many  of  the  inputs  are  updated  dynamically,  and  the  analysis  must  update  its 
results  dynamically  in  response.  In  particular: 

•  The  live  method  set  can  increase  at  any  time,  which  means  that  new  classes  may  be 
found  to  have  instances. 

•  The  set  of  classes  in  the  program  can  increase  at  any  time,  as  they  are  loaded  on 
demand.  This  means  that  classes  can  acquire  new  subclasses. 

•  At  any  time,  a  tool  can  add  to  its  S  set  and  T  set  and  corresponding  D[  and  TR  entries. 

None  of  these  issues  have  a  major  impact  on  performance,  but  they  significantly  complicate 
the  implementation,  because  new  nodes  and  edges  are  added  to  the  propagation  graph 
during  processing. 

5.4  RTA++:  Tracking  Typecases 

5.4.1  Motivation 

Java  lacks  a  “typecase”  statement  or  expression.  Instead,  the  programmer  must  use  a 
combination  of  instanceof  and  downcasts  to  first  test  whether  an  object  belongs  to  a 
certain  class,  and  then  downcast  the  object  reference  if  it  belongs  to  the  class.  Figure  5-6 
shows  an  example;  similar  patterns  occur  frequently  in  many  programs.  The 
instanceof  guard  ensures  that  the  downcast  is  completely  safe. 

I  have  extended  Ajax  RTA  to  prove  that  these  downcasts  are  safe.  The  resulting  analysis  is 
called  “RTA++”. 
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class  C  { 

Ob j  ect 

f ieldA; 

Ob j  ect 

f ieldB; 

public 

boolean 

equals (Ob j ect  x) 

{ 

if 

(x  instanceof  C)  { 

C  c  = 

( C )  x  ; 

return 

c . f ieldA. equals 

(f ieldA) 

&& 

c . f ieldB . equals 

(f ieldB) ; 

} 

else  { 

} 

} 

} 

return 

false; 

Figure  5-6.  A  Java  program  using  instanceof  and  checkcast 


5.4.2  Refining  the  Bytecode  Type  Assignment 

The  idea  is  to  improve  the  accuracy  of  the  procedure  of  Section  5.2.2,  which  assigns  static 
Java  types  to  expressions.  In  Figure  5-6,  the  occurrence  of  x  inside  the  if  body  will  be 
assigned  the  Java  type  C.  The  analysis  then  concludes  that  x  can  only  be  aliased  to  instances 
of  C  or  its  subclasses;  with  this  information,  the  Ajax  downcast  checking  tool  proves  that 
the  downcast  is  safe. 

The  improved  static  type  assignment  requires  some  simple  intraprocedural  data  flow 
analysis.  First,  Ajax  RTA  computes  “must  alias”information  for  all  local  variables  and 
stack  elements,  using  value  numbering.  For  each  boolean  variable  or  stack  element,  Ajax 
also  determines  whether  the  value  corresponds  to  the  result  of  an  instanceof  operation, 
and  if  so,  which  variable  and  class  were  tested. 

The  basic  algorithm  for  computing  static  Java  types  for  value-points  uses  standard  forward 
data  flow  analysis.  For  each  instruction,  there  is  a  “transfer  function”  describing  how  the 
types  of  variables  and  stack  elements  at  the  successor  instruction(s)  depend  on  the  types  of 
the  variables  and  stack  elements  at  the  current  instruction.  In  the  RTA++  algorithm,  the 
transfer  function  corresponding  to  a  conditional  branch  checks  to  see  whether  the  branch 
condition  is  the  result  of  an  instanceof.  If  so,  then  in  the  “branch  taken”  case  all  known 
aliases  to  the  tested  variable  are  known  to  be  instances  of  the  tested  class.  This  fact  is  used 
to  narrow  the  types  assigned  to  the  aliased  variables  at  the  successor  instruction. 

Similar  techniques  have  been  used  by  JIT  compilers  [18]  to  reduce  the  overhead  of 

instanceof/checkcast  pairs. 

This  technique  could  also  improve  the  accuracy  of  other  tools  using  Ajax  RTA,  but  in 
practice  the  effect  is  only  noticeable  for  the  downcast  checking  tool. 
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6  The  SEMI  Analysis 


6.1  Introduction 

6.1.1  Chapter  Overview 

Previous  work  [54]  investigated  using  Hindley-Milner  style  polymorphic  type  inference  to 
extract  a  VPR-like  relation  from  C  programs.  This  thesis  extends  that  work  by  introducing 
an  analysis  with  new  features,  including  support  for  Java  bytecode  programs.  This  analysis 
is  called  SEMI  (short  for  “semiunification”).  SEMI  combines  the  following  features: 

•  A  flexible  and  robust  framework  based  on  type  inference  with  polymorphic  recursion. 

•  A  number  of  modes  and  optimizations  allowing  varying  tradeoffs  between  time,  space 
and  accuracy. 

•  A  formal  model  in  terms  of  the  Micro  Java  Bytecode  language  and  the  value-point  rela¬ 
tion. 

•  A  proof  of  soundness  in  terms  of  the  model. 

•  An  implementation  within  the  Ajax  framework  which  allows  SEMI  to  be  used  with  a 
variety  of  tools,  and  in  combination  with  other  analyses  such  as  RTA.  (However,  SEMI 
is  completely  independent  of  the  other  analyses.) 

Standard  analyses  based  on  type  inference  are  based  on  constraints.  They  define  a  language 
of  terms,  including  variables  standing  for  terms,  and  a  language  of  constraints  holding 
between  terms.  Syntax  driven  rules  specify  the  construction  of  an  initial  constraint  set  for 
any  given  program.  The  constraints  are  solved  to  find  canonical  or  minimal  solutions,  i.e., 
assignments  of  terms  to  variables.  The  inference  system  is  constructed  so  that  the  solutions 
represent  certain  invariants  of  the  program. 

SEMI  follows  a  similar  pattern.  However,  to  simplify  the  presentation,  SEMI  does  not  use 
terms;  term  structures  are  encoded  using  “component  constraints”,  and  information  about 
term  constructors  is  omitted.  In  SEMI,  constraints  hold  only  between  atomic  variables.  A 
SEMI  variable  can  be  thought  of  as  the  inferred  type  of  a  program  variable.  More 
discussion  of  this  presentation  is  given  below  in  Section  6.2. 1.2. 

Although  SEMI  is  inspired  by  type  inference,  and  it  is  useful  to  apply  intuitions  about  type 
inference  to  help  understand  SEMI,  SEMI  is  not  in  fact  a  type  inference  algorithm. 
Formally,  it  is  nothing  more  than  a  system  for  computing  an  approximation  to  the  value- 
point  relation.  Nevertheless,  in  this  chapter  I  use  the  word  “type”  to  refer  to  information 
computed  by  SEMI.  Java  types  are  largely  irrelevant  to  SEMI,  and  my  use  of  the  word 
“type”  never  refers  to  Java  types  unless  explicitly  noted. 
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This  chapter  gives  a  formal  specification  for  SEMI,  as  applied  to  the  Micro  Java  Bytecode 
language,  and  a  proof  that  any  algorithm  satisfying  the  specification  computes  a  conser¬ 
vative  approximation  to  the  VPR.  The  details  of  the  implementation  are  deferred  to  the  next 
chapter. 

6.1.2  Approach 

I  have  chosen  to  present  a  direct  proof  of  soundness  in  terms  of  MJBC,  rather  than  trans¬ 
lating  to  and  from  a  more  traditional  lambda  language  and  doing  the  proof  in  a  conventional 
setting.  Consequently,  the  proof  is  rather  long  and  the  style  may  be  unfamiliar.  However,  a 
proof  in  a  conventional  setting  would  also  be  rather  difficult,  because  even  after  translation 
the  system  would  contain  the  following  features: 

•  Higher-order  functions 

•  Polymorphic  functions 

•  Unrestricted  recursion  (declarations  not  block-structured) 

•  Records 

•  Row-polymorphism  (record  types  polymorphic  over  a  set  of  “unknown”  additional 
fields) 

•  Polymorphic  recursion 

•  Mutable  references 

•  Exceptions 

•  Soft  typing 

Specifying  and  proving  the  correctness  of  the  analysis  directly  in  terms  of  MJBC  also  keeps 
the  formal  presentation  closer  to  the  actual  implementation. 

6.1.3  Implications 

This  chapter  does  not  merely  confirm  facts  already  believed.  It  also  reveals  that  the 
analysis  places  no  static  constraints  on  the  program  whatsoever.  Even  though  the  imple¬ 
mentation  assumes  that  the  Java  program  passes  bytecode  verification  and  is  therefore  stati¬ 
cally  well-typed  according  to  the  Java  language  rules,  the  system  presented  here  does  not. 
In  other  words,  SEMI  could  be  implemented  without  making  any  assumptions  about  the 
target  program. 

This  is  useful  in  practice,  because  it  means  that  variations  in  the  static  verification  policies 
of  different  virtual  machines  have  no  impact  on  SEMI.  It  is  also  useful  because  it  means 
that  Ajax  could  be  applied  to  ill-formed  programs,  such  as  programs  undergoing  modifica¬ 
tions  —  provided  those  programs  can  be  translated  into  bytecode. 

Note  that  according  to  the  semantics  of  MJBC,  the  execution  of  a  program  which  would  not 
be  statically  well-typed  according  to  Java  may  reach  a  state  in  which  no  normal  transition 
is  possible.  For  example,  a  program  may  attempt  to  fetch  a  field  when  the  top  of  the 
working  stack  does  not  contain  an  object  reference.  However,  according  to  the  semantics, 
a  spontaneous  exception  throw  is  always  possible.  This  implies  that  a  program  will  never 
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“get  stuck”;  when  no  normal  transition  is  possible,  it  will  simply  throw  a  spontaneous 
exception.  Of  course,  if  the  exception  is  not  caught,  the  method  call  stack  will  unwind  and 
the  program  will  eventually  halt  due  to  the  uncaught  exception. 

This  is  realistic,  as  many  VMs  can  report  type  errors  during  execution,  when  code  is 
dynamically  and  lazily  linked.  SEMI  can  account  for  such  behavior. 

6.1.4  Relationship  to  the  Implementation 

The  constraints  and  rules  described  here  are  almost  the  same  as  those  implemented  in 
SEMI,  for  the  subset  of  Java  bytecode  corresponding  to  MJBC. 

One  small  but  significant  departure  of  this  formalism  from  the  implementation  is  the 
treatment  of  one  constraint  for  the  new  instruction.  (See  footnote  “a”  below,  on  page  112.) 
I  believe  that  the  implemented  constraint  is  correct,  but  it  would  require  significant 
additional  work  to  extend  the  proof  system  to  accommodate  it. 

SEMI’s  implementation  incorporates  a  number  of  optimizations  that  mean  some  of  the 
constraints  here  never  arise.  For  example,  exceptions  and  the  globals  object  are 
“globalized”  (see  Section  7.6),  and  no  instance  constraints  are  ever  applied  to  them.  When 
only  one  instance  of  a  particular  variable  is  possible,  SEMI  replaces  the  instance  constraint 
with  an  equality,  which  gives  the  same  results  and  saves  time  and  space.  (Intuitively,  if 
there  is  only  one  instance  of  a  polymorphic  value,  it  may  as  well  not  be  polymorphic.) 
These  optimizations  are  applied  in  the  constraint  generation  phase,  so  the  constraint  gener¬ 
ation  code  does  not  correspond  closely  to  the  description  here.  For  details,  see  Chapter  7. 

6.1.5  Chapter  Organization 

Section  6.2  describes  the  sets  of  constraints  used  by  SEMI,  and  defines  a  “closed  form”  for 
these  sets  that  represents  a  solution  to  the  constraints.  All  discussion  of  how  to  produce  such 
a  closed  form  is  deferred  to  Chapter  7.  Section  6.3  presents  an  informal  overview  of  how 
SEMI  treats  Java  programs,  by  translating  Java  bytecode  examples  into  a  functional 
language  whose  standard  typing  rules  would  induce  similar  constraints  to  SEMI’s. 

Section  6.4  defines  the  initial  constraint  set  for  an  MJBC  program  and  presents  a  complete 
example  of  a  program  and  its  analysis  using  constraints.  In  Section  6.5  the  relationship 
between  the  VPR  and  constraint  sets  is  formally  defined.  The  definition  requires  some 
auxiliary  judgements,  which  are  defined  and  some  properties  of  which  are  proved.  The 
implementation  of  the  Ajax  tool  interface  using  SEMI  is  discussed  in  Section  6.6. 

The  remainder  of  the  chapter  is  Section  6.7,  which  proves  that  any  closed  constraint  set 
gives  rise  to  a  sound  VPR  approximation.  This  is  similar  to  a  proof  of  soundness  of  a  type 
system,  but  rather  different  in  flavor  due  to  the  non-traditional  setting.  This  section,  and 
part  of  Section  6.5,  contain  a  great  deal  of  rather  dense  mathematics.  The  casual  reader 
should  focus  on  the  statements  of  lemmas  and  theorems,  which  describe  the  invariants  of 
SEMI  that  make  it  sound. 
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6.2  Constraint  System 

6.2.1  Constraints 

6.2. 1.1  Constraint  Structures 

The  SEMI  solver  uses  the  following  structures: 

•  V  —  the  set  of  variables 

These  can  be  thought  of  as  type  variables.  Each  program  variable  (or  in  general,  each 
bytecode  expression)  has  a  SEMI  variable  associated  with  it. 

•  L  —  the  set  of  component  labels  (e.g.,  par  am,  result,  fieldA) 

SEMI  treats  these  as  abstract  entities  and  assigns  no  meaning  to  them.  They  are  used  in 
component  constraints. 

•  I  —  the  set  of  instance  labels 

Each  instance  label  represents  a  program  site  at  which  a  polymorphic  value  is  being 
used.  SEMI  treats  them  as  abstract  entities  and  assigns  no  meaning  to  them.  They  are 
used  in  instance  constraints. 

•  C  —  a  set  of  constraints  of  the  following  kinds: 

•  =  v”  —  an  equality  constraint  expressing  the  fact  that  the  two  variables  u  and  v 
are  to  be  considered  identical.  In  the  presence  of  such  a  constraint,  two  bytecode 
expressions  which  are  mapped  to  constraint  variables  u  and  v  respectively  will  be 
considered  related  in  the  value-point  relation. 

•  11  v  a  component  constraint  expressing  the  fact  that  variable  //’s  component 
with  label  c  is  variable  v.  These  constraints  can  be  thought  of  as  encoding  the  struc¬ 
ture  of  terms.  They  are  used  to  relate  types  of  object  references  to  the  types  of  their 
fields,  and  also  the  types  of  methods  to  the  types  of  their  parameters  and  results. 

•  “//  v”  —  an  instance  constraint  expressing  the  fact  that  variable  it's  instance  /'  is 
variable  v.  Intuitively,  v  can  be  thought  of  as  the  /’th  copy  of  //.  In  the  presence  of 
such  a  constraint,  two  bytecode  expressions  mapping  to  variables  u  and  v  respec¬ 
tively  will  be  considered  related  in  the  value-point  relation. 

If  the  constraint  //  v  is  present  in  a  set,  then  I  write  “v  is  an  instance  of//”  and  “//  is  a 
source  of  v”.  The  set  should  be  clear  from  context.  If  “//  >c  v”  is  in  a  set,  then  I  write  “v  is 
a  component  of//”  and  “//  is  a  parent  of  v”. 

The  rules  that  assign  an  initial  constraint  set  to  a  program  are  given  in  Section  6.4. 

6.2.1.2  Relationship  to  Terms 

To  illustrate  the  relationship  between  standard  polymorphic  recursion  [42]  and  this  setting, 
consider  the  following  code,  expressed  in  a  typed  lambda  calculus.  This  is  a  function  to 
swap  the  two  elements  of  a  pair. 

Xx.  (snd(x),  fst(x)) 
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where  “fst”  and  “snd”  are  the  standard  projection  operations  on  pairs.  While  performing 
type  inference  with  polymorphic  recursion,  the  following  constraint  arises  for  the  type  of 
“snd”  itself,  when  we  consider  the  invocation  of  the  operator  “snd”: 

(to’ 1 1 )  >  1 1 

This  represents  the  fact  that  the  type  of  “snd”,  which  is  known  to  be  (t0,  tj )  — >  t  x  (where  t0 
and  tj  are  type  variables  standing  for  arbitrary  types),  is  instantiated  at  program  point  Z , 
to  some  currently  unknown  function  type  u  j  — »  u9  (where  u  |  and  u2  are  also  type  variables 
standing  for  arbitrary  types).  (Z  (  would  be  the  program  point  of  the  call  to  the  “snd” 
function.)  In  other  words,  the  type  u  l  — »  u9  is  constrainted  to  be  a  polymorphic  instance  of 
(t()’  ti)  *i  • 

This  constraint  on  terms  could  be  translated  into  the  following  set  of  SEMI  constraints: 


{^srid  ^param  ^snd-p’  ^snd-p  ^tuple-0  ^0’  ^snd-p  ^tuple-1  ^1’  ^snd  ^result  ^1’  ^snd  V’ 


param  "  snd-p?  A  snd-p 
V  >paramUnV  ^result  U2> 


-z, 


Note  that  the  terms  have  been  decomposed  into  variables  related  by  component  constraints. 
This  has  required  the  introduction  of  new  variables  Tsnd,  Tsnd_p,  and  v  to  represent  the 
compound  terms  and  subterms  (t0,  tx)  — >  tj ,  (t0,  tj)  and  Uj  — »  u9  respectively.  The  term 
constructors  have  disappeared  entirely.  This  is  why  SEMI  is  not  suitable  as  a  type  inference 
system;  it  can  never  detect  conflicts  between  type  constructors.  In  a  situation  where  term 
unification  would  fail  due  to  constructor  mismatch,  SEMI  assigns  different  kinds  of 
components  to  the  same  variable.  For  example,  it  might  infer  that  a  variable  has  both 
“tuple-//”  and  “param”  components,  as  if  the  variable  were  both  a  tuple  and  a  function.  This 
is  in  fact  an  advantage  for  SEMI;  it  will  never  reject  a  program  as  unsuitable  for  analysis. 
(In  other  words,  SEMI  is  a  “soft  typing”  system  [85].) 

The  advantage  of  the  SEMI  representation  is  that  it  is  very  simple,  yet  carries  all  the  infor¬ 
mation  required  to  perform  the  analysis.  Its  particular  advantage  is  in  representing  recursive 
structures,  which  are  very  common  in  this  kind  of  analysis;  standard  term  representations 
need  to  be  extended  with  recursive  constructs  such  as  “pt.T”,  where  “t”  occurs  free  in  T, 
meaning  the  solution  to  the  fixpoint  equation  “t  =  T(t)”. 


6.2.2  Solutions 

A  solution  to  a  constraint  set  C  is  another  constraint  set  C'  such  that  CcC  and  C'  is 
closed.  A  closed  constraint  set  can  be  thought  of  as  a  set  in  which  all  implicit  relationships 
implied  by  the  constraints  are  stated  explicitly.  A  VPR  approximation  can  be  efficiently 
computed  from  such  a  set.  C  is  closed  if  it  satisfies  the  conjunction  of  the  following  condi¬ 
tions:  (I,  u,  v  and  w  range  over  constraint  variables) 

•  Equality  closure:  equality  constraints  in  a  closed  set  possess  the  usual  properties  of 
symmetry,  transitivity  and  substitutional  equivalence. 

Vi,  u.  {t  =  //}  c:  C  =>  {//  2<}cC 
Vi,  u,  v.  {t  =  a,  ti  =  v}  c:  C  =>  (i  =  v}  c:  C 

Vi,  u,  v,  c.  {t  =  u,  i  >c  v}  <z  C  =>  {//  >c  v}  c:  C 

Vi,  u,  v,  c.  {t  =  u,  v>c  i}cC=>  {v  >c  //}  e  C 

Vi,  u,  v,  i.  {t  =  u,  l  V,  |cf  {//  V,  v}  e  C 
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Vi,  77,  V,  7.  {t  =  77,  V  i  }  e  C  =>  {v  7/}  Cl  C 

Equality  is  meant  to  be  reflexive,  but  it  is  troublesome  to  require  reflexivity  constraints 
as  explicit  elements  of  the  constraint  set.  The  obvious  rule  V u .  {u  =  u}  c  C  is  unde¬ 
sirable  because  it  requires  C  to  contain  an  infinite  number  of  constraints.  A  more  com¬ 
plex  definition  is  possible,  but  in  fact  there  is  no  need  for  explicit  reflexivity 
constraints,  so  they  are  not  required  to  be  in  the  set. 

•  Component  uniqueness:  a  variable  has  at  most  one  distinct  component  with  a  given 
label. 

Vi,  77,  V,  C.  {t  >c  77,  t  \>c  v}  <z  C  =>  {7/  =  v}  <z  C 

•  Instance  uniqueness:  a  variable  has  at  most  one  distinct  instance  with  a  given  label. 

Vi,  77,  v,  i.  {i  V,  u,  t  V,  v}  <z  C  =>  {7/  =  v}  <z  C 

•  Component  propagation:  if  a  variable  has  a  component  v,  then  its  instances  also  have 
the  component. 

Vi,  77,  v,  c,  i.  {i  >c  77,  i  V,  v}  <z  C  =>  3 w.  {v  >c  w}  c:  C 

•  Instance  propagation:  instance  relationships  propagate  to  matching  components. 

Vi,  77,  v,  w,  c,  i.  {i  >c  77,  i  V,  v,  v  >c  w  }  <z  C  =>  {7/ 

Given  any  finite  set  of  constraints  C,  there  is  always  a  finite  solution  set  C'  such  that 
CcC  and  C  is  closed.  For  example,  the  set  C  could  be  C  with  equality  constraints 
added  between  all  variables  mentioned  in  C,  and  all  instance  and  component  relationships 
holding  between  all  the  variables.  This  would  be  a  correct  solution,  but  not  a  very  useful 
one  because  the  induced  value -point  relation  would  relate  every  pair  of  bytecode  expres¬ 
sions. 

A  more  realistic  strategy  is  to  interpret  the  closure  rules  as  production  rules.  At  each  step, 
if  the  set  of  constraints  is  not  closed,  the  algorithm  selects  a  rule  whose  hypothesis  is 
satisfied  but  whose  consequent  is  not  and  adds  the  constraint  required  to  satisfy  the  conse¬ 
quent.  Unfortunately,  this  algorithm  does  not  terminate  for  practical  examples. 

Discussion  of  the  actual  SEMI  algorithm  is  deferred  to  Chapter  7.  In  this  chapter,  I  treat  it 
as  a  black  box  and  show  that  given  an  appropriate  set  of  initial  constraints,  any  closed 
solution  gives  rise  to  a  conservative  approximation  of  the  value-point  relation. 

6.2.3  Remarks 

Simplifications  of  the  closure  rules  give  rise  to  a  number  of  previously  studied  analyses. 
For  example,  if  one  takes  only  the  equality  closure  rules  plus  two  rules  below  forcing 
components  and  instances  to  be  degenerate,  one  obtains  a  simple  monomorphic,  struc¬ 
tureless  type  inference  analysis  similar  to  Steensgard’s  [72]: 

Vi,  77,  v,  c.  {t  >c  7/}  cC^>  {7/  =  i}  c:  C 
Vi,  77,  v,  i.  {i  V,  7/}  cC^>  {77  Ei}cC 

If  one  takes  only  the  equality  rules  and  the  component  uniqueness  rule,  and  forces  instances 
to  be  degenerate,  then  one  obtains  a  monomorphic  type  inference  analysis  with  structures. 
This  system  essentially  performs  simple  term  unification.  Cycles  in  the  graph  of  component 
constraints  are  allowed,  and  correspond  to  recursive  type  terms. 
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With  the  full  treatment  of  polymorphic  instance  constraints  as  described,  the  system  corre¬ 
sponds  to  type  inference  with  polymorphic  recursion  using  semiunification,  again  with 
recursive  terms  allowed.  (The  term  “polymorphic  recursion”  means  that  cycles  in  the  graph 
of  instance  constraints  are  allowed,  such  as  when  a  polymorphic  function  recursively  calls 
itself  and  passes  in  one  of  its  original  parameters.) 

In  general  it  is  not  possible  to  compute  a  “most  general”  or  “principal”  closed  constraint 
set.  This  is  discussed  further  in  Section  7.1.2. 

6.3  The  Encoding 

6.3.1  Introduction 

SEMI  generates  a  set  of  initial  constraints  directly  from  a  bytecode  program  and  then  solves 
them  to  find  a  closed  form.  However,  the  procedure  can  be  viewed  conceptually  as  a  trans¬ 
lation  from  the  bytecode  language  into  an  extended  lambda  calculus,  followed  by  gener¬ 
ation  of  type  constraints  for  the  translated  code,  followed  by  solution  of  the  type  constraints 
to  yield  inferred  types.  Here  I  provide  an  informal  description  of  SEMI  from  the  latter  point 
of  view. 

6.3.2  Methods 

Each  Java  bytecode  method  declaration  is  translated  to  a  function  declaration.  Each 
function  can  take  multiple  parameters  directly  —  no  currying  is  used.  The  implicit  “this” 
parameter  of  non- static  methods  becomes  an  explicit  parameter  in  the  translation. 
Functions  return  two  values:  the  value  returned  normally  by  the  method,  and  the  thrown 
exception,  if  any.  Methods  that  return  nothing  (“void”)  have  a  return  value  in  the  trans¬ 
lation,  but  the  value  is  always  ignored.  (In  the  formal  MJBC  semantics,  every  function 
returns  a  value,  so  this  issue  does  not  arise.) 

Therefore  this  method  that  adds  3  to  x 

int  add3(int  x)  {  load  x;  bipush  3;  iadd;  ireturn;  } 

translates  to  the  equivalent  of 

fun  add3  (this,  x)  =  (x  +  3,  ...) 

The  “...”  indicates  that  there  is  no  value  for  the  exception;  its  type  is  unconstrained.  This 
means  that,  after  type  inference,  the  type  of  the  exception  will  be  a  unique  type  variable. 
SEMI  will  conclude  that  the  exception  is  not  related  in  the  VPR  to  any  other  value,  as  one 
would  hope,  since  there  is  in  fact  no  exception.  (Obviously  “...”  precludes  the  translated 
code  from  being  executable,  but  that  is  not  a  problem.)  (A  sum  type  could  be  used  instead 
of  a  pair,  to  indicate  that  only  one  of  the  alternatives  is  possible,  but  this  leads  to  essentially 
the  same  type  constraints.) 

Methods  are  assigned  function  types.  The  above  method  would  be  assigned  the  following 
“type”: 

add3:  \/a,b,e.  (a)  — »  (b,  e) 
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The  intuition  behind  the  interpretation  of  these  types  is  that  if  two  variables  can  be  inferred 
to  have  different  types,  then  they  cannot  be  aliased  in  the  VPR  sense.  If  they  are  always 
inferred  to  have  the  same  type,  then  they  may  be  aliased. 

Even  though  the  x  parameter’s  real  type  is  int,  we  assign  it  a  type  variable  so  that  we  can 
compare  its  type  meaningfully  with  the  types  of  other  variables  which  also  hold  integers. 
For  example,  here  we  can  see  that  the  value  returned  by  add3  is  a  new  integer,  different 
from  the  parameter.  (We  can  also  see  that  the  parameter  and  result  are  both  different  from 
whatever  exception  may  be  thrown  by  add3.) 

In  SEMI,  these  inferred  types  become  atomic  constraint  variables  connected  by  component 
constraints  as  discussed  in  Section  6.2. 1.2.  For  example,  the  above  type  would  be  repre¬ 
sented  as 

add3:  T,  where  the  constraint  set  contains 

{  T  param-0  a  -  T  ^result  ^  ^exn  e  ) 

6.3.3  Global  Variables 

Global  variables  (Java  “static  fields”)  are  passed  into  all  functions  in  an  extra  record 
parameter.  Each  slot  of  the  record  corresponds  to  one  global  variable.  For  example,  the 
method 

int  getGlobal  ( )  { 

getstatic  globalVar;  ireturn; 

} 

translates  to  the  equivalent  of 

fun  getGlobal ( globals )  = 

( globals  .  globalVar ,  ...) 

The  function  simply  performs  the  assignment  and  then  returns  no  result  and  no  exception. 
The  following  type  signature  would  be  inferred  for  this  function: 

getGlobal:  \/a,  e,  p.  ({  globalVar:  a\  p  })  — »  (a,  e) 

This  signature  requires  globals  to  have  a  field  globalVar  of  type  a ,  which  must  be 
the  same  type  as  the  result.  The  polymorphic  type  variable  p ,  sometimes  referred  to  as  a 
“row  variable”,  represents  the  types  of  an  unknown  set  of  other  fields  of  globals  (i.e., 
other  global  variables).  This  signature  allows  the  other  global  variables  to  have  any  type. 

This  treatment  of  globals  means  that  all  function  bodies  are  closed ,  i.e.,  refer  only  to 
variables  defined  locally  or  available  as  parameters,  or  to  other  functions.  Therefore,  in  the 
type  inferred  for  each  function,  every  type  variable  can  be  polymorphically  generalized.  (In 
the  language  of  Hindley-Milner  type  inference,  every  type  variable  is  free  in  the  enclosing 
type  environment.) 

If  global  variables  were  instead  declared  as  variables  in  the  enclosing  environment,  e.g., 

let  globalVar  =  ref  0  in 

fun  getGlobal  ( )  =  (globalVar,  ...) 

then  the  type  signatures  would  be 
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globalVar:  a 

getGlobal:  Ve.  ()  — »  (a,  e) 

The  expression  ref  0  indicates  that  globalVar  is  mutable  and  therefore  its  type  cannot 
be  polymorphically  generalized;  usage  of  globalVar  in  different  contexts  may  refer  to 
the  same  runtime  value,  and  therefore  globalVar  must  have  the  same  type  a  in  all 
contexts.  Similarly,  in  the  type  inferred  for  getGlobal,  a  cannot  be  polymorphically 
generalized  because  it  is  constrained  to  the  type  of  globalVar. 

The  two  strategies  actually  produce  the  same  analysis  results,  because  even  when  each 
function  takes  the  global  variable  record  as  a  polymorphic  parameter,  there  is  really  only 
one  global  variable  record  in  the  program  and  one  “canonical”  type  for  this  record  (its  type 
in  the  program’s  main  function).  This  “top  level”  type  is  a  polymorphic  instance  of  every 
other  type  for  the  global  variable  record.  Lemma  6-21  below  and  Section  7.6  explain  this 
in  more  detail. 

For  simplicity,  SEMI  uses  explicit  global  variable  passing,  so  that  every  type  variable  in  a 
function  signature  is  polymorphically  generalized.  The  implementation  performs  optimiza¬ 
tions  for  types  (such  as  the  types  of  global  variables)  that  have  only  one  meaningful 
instance;  this  is  discussed  in  Section  7.6.  In  the  rest  of  this  section  the  global  variable 
passing  is  ignored  for  the  sake  of  brevity. 

The  “row  variables”  do  not  occur  in  SEMI’s  constraints.  They  are  implicit.  For  example, 
the  above  method  would  be  given  the  following  constraints: 

getGlobal:  T 

where  the  constraint  set  contains 

{  T  ^globals  Tglobals’  Tglobals  ^globalVar  V  T  ^result  V  T  ^exn  e  } 

6.3.4  Object  Encoding 

Java  objects  are  treated  as  extensible  records,  each  similar  to  the  “global  variables”  record. 
Each  slot  of  the  record  contains  either  a  field  or  a  method.  For  example,  the  code 

int  getX ( )  { 

load  this;  getfield  fieldX;  return; 

} 

would  translate  to  (ignoring  the  globals  object  for  now) 

fun  getX(this)  = 

( this  .  f  ieldX,  ...) 

This  would  get  type  signature 

getX:  \/ci,d,p.  ({  fieldX:  a\  p  })  — »  (a,  d) 

Here  t  h  i  s  is  deconstructed  into  a  record  containing  field  f  i  e  1  dX  of  type  a  and  some  set 
of  other  fields  of  types  p.  Effectively,  this  function  and  its  type  say  nothing  about  what 
other  fields  of  this  there  may  be.  Any  object  containing  a  fieldX  can  be  passed  in.  In 
fact,  any  object  at  all  can  be  passed  in,  and  the  type  inference  algorithm  will  infer  that  it 
contains  fieldX.  This  “row  polymorphism”  avoids  any  need  for  subtype  polymorphism 
in  this  type  system.  (This  complete  reliance  on  row  polymorphism  distinguishes  this  type 
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system  from  the  type  system  of  O’Caml  [65],  where  row  polymorphism  is  available  but 
explicit  classes  and  subtyping  are  usually  used  instead.)  It  also  helps  reduce  the  sizes  of 
types  inferred  for  functions,  because  only  fields  actually  used  by  the  function  are  given 
types  in  the  function’s  signature. 

Field  names  are  always  fully  qualified  with  the  name  of  the  class  in  which  they  are 
declared,  so  two  fields  of  different  classes  which  happen  to  have  the  same  name  are  never 
confused  in  the  translation. 

The  Java  class  of  an  object  is  never  represented  in  the  translation  or  in  the  type  inference 
system.  The  implications  of  this  are  discussed  in  the  following  sections.  Tools  based  on 
SEMI  can  recover  class  information  using  the  VPR;  this  is  discussed  in  Chapter  10  and 
elsewhere. 

6.3.5  Method  Encoding 

6.3.5.1  Static  Methods 

Static  methods  are  treated  as  normal  functions.  A  call  to  a  static  method  is  translated  into  a 
direct  call  to  the  appropriate  function.  For  example,  the  code  in  Figure  6-1  would  be  trans¬ 
lated  to  the  equivalent  of  the  code  in  Figure  6-2. 


static  int  addOne(int  x)  { 

load  x;  bipush  1;  iadd;  ireturn; 

} 

static  int  addOneWrapper ( int  y)  { 

load  y;  invokestatic  addOne;  ireturn; 

} 

Figure  6-1.  Static  Method  Example 


fun  addOne (x)  = 

(x  +  1,  ...) 

fun  addOneWrapper ( y)  = 

(addOne  (y)  ,  ...) 

Figure  6-2.  Static  Method  Translation 

Because  the  function  addOne  is  a  polymorphic  value,  its  use  in  addOneWrapper  is 
assigned  a  fresh  polymorphic  instance  of  the  type  of  addOne.  All  calls  to  static  methods 
are  treated  polymorphically.  (In  other  words,  static  method  calls  are  analyzed  with  calling- 
context  sensitivity.)  Intuitively,  this  is  safe  because  (being  closed)  distinct  calls  to  addOne 
are  completely  independent  and  cannot  communicate  except  through  the  caller’s 
environment. 

6.3.5.2  Nonstatic  Methods 

Nonstatic  methods  —  that  is,  methods  involved  in  dynamic  dispatch  —  are  encoded  by 
treating  them  as  functions  assigned  to  the  slots  of  objects  when  those  objects  are  created. 
For  example,  the  code  in  Figure  6-3  would  be  translated  to  the  equivalent  of  the  code  in 
Figure  6-4. 
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class  MyObj  { 
int  fieldX; 

int  MyObj_getX (MyObj  this)  { 

load  this;  getfield  MyObj_f ieldX;  ireturn; 

} 

} 

static  int  getter (Obj ect  o)  { 

load  o;  invokevirtual  getX;  ireturn; 

} 

static  int  main ( )  { 

new  MyObj;  invokestatic  getter;  ireturn; 

} 

Figure  6-3.  Nonstatic  Method  Example 


fun  MyOb j_getX ( this )  = 

( this  .MyOb j_f  ieldX,  ...) 
fun  getter(o)  =  (o.getX) (o) 
fun  main ( )  = 

let  obj  =  {  getX:  MyObj_getX;  MyOb j_f ieldX :  0;  } 

in  getter (obj ) 

Figure  6-4.  Nonstatic  Method  Translation 

The  following  types  are  inferred: 

MyObj _getX:  Va,  e,  p.  ({  MyObj_fieldX:  a\  p  })  — »  (a,  e ) 
getter:  \/b,  e,  p.  (t)  — »  ( b ,  e)  where  t  =  {  getX:  (t)  — >  ( b ,  e );  p  } 

obj  (in  main):  u  where  u  =  {  getX:  (u)  — »  ( c ,  e );  MyObj_fieldX:  c\  p  } 

(for  some  c,  e,  p) 

Note  that  objects  containing  methods  usually  have  recursive  types,  because  the  type  of  the 
this  parameter  in  each  method  type  is  usually  the  same  as  the  object  type. 

Another  example  of  the  treatment  of  virtual  method  calls,  expressed  directly  in  the 
constraint  language  of  SEMI,  is  given  below  in  Section  6.4.7. 

6.3.5.3  Type  Checking/Inference  For  Nonstatic  Methods 

Given  the  above  types  and  assuming  standard  type  checking  rules,  it  is  straightforward  to 
show  that  the  types  are  consistent  with  the  code  and  each  other. 

For  example,  to  typecheck  getter,  we  observe  that  the  type  of  o  is  t ,  and  therefore  the 
type  of  o.getX  is  ( t )  — »  ( b ,  e).  In  the  call  to  o  .  getX,  we  indeed  pass  in  a  parameter  of 
type  t  (o).  Furthermore,  the  result  returned  from  getX  has  type  (b,  e),  which  correctly 
matches  the  return  type  of  getter. 

Note  that  getter  is  typechecked  (and  can  have  its  type  inferred)  independently  of  any 
information  about  the  callee  in  the  call  to  getX  (MyOb  j  _ge tX).  All  that  is  required  is  that 
the  type  of  the  getX  method  recorded  in  the  type  of  getter’s  o  parameter  is  consistent 
with  the  actual  usage  of  that  method  within  getter.  The  type  information  recorded  for 
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getX  in  the  type  signature  of  getter  effectively  describes  how  the  method  is  used  by 

getter. 

To  check  the  type  of  ob  j  in  main,  observe  that  it  constrained  both  by  the  initialization  of 
ob  j  as  a  new  MyOb  j  object  and  by  ob  j  being  passed  as  a  parameter  to  getter.  The 
initialization  of  ob  j  requires  ob  j  ’s  type  u  to  be  the  type  of  an  object  containing  a  getX 
method  and  a  MyOb  j_f  ieldX  field.  Furthermore,  the  type  of  the  getX  method  within  u 
must  be  a  polymorphic  instance  of  the  type  of  MyObj_getX  (which  is  “Va,  e,  p. 

({  MyObj_fieldX:  a\  p  })  — »  (a,  e)”).  If  no  method  call  was  made  on  the  object,  we  could 
therefore  just  set  u  =  {  getX:  ({  MyObj_fieldX:  c,  p  })  — »  (c,  e);  MyObj_fieldX:  d;  p  }  (for 
some  c,  d,  e,  p,  /?'). 

However,  the  type  of  obj  is  also  constrained  by  the  call  to  getter  (obj  ) .  This  call 
requires  u  to  be  some  polymorphic  instance  of  getter’s  parameter  type  t,  where  t  = 

{  getX:  ( t )  — »  (h,  e);  p  }.  Because  the  parameter  type  of  f  s  getX  method  is  t  itself,  the 
parameter  type  of  w’s  getX  method  is  also  required  to  be  u  itself.  Unifying  this  constraint 
with  the  constraints  mentioned  above  requires  u  to  be  of  the  form  {  getX:  (»)  — »  (c,  e)\ 
MyObj_fieldX:  c;  p }. 

Note  also  that  the  type  signature  of  getter  promises  that  its  result  has  the  same  type  ( b ) 
as  the  result  of  its  object  parameter’s  getX  method.  Therefore  in  main  we  leam  that  the 
result  of  the  call  to  getter  will  have  type  c. 

6.3.5.4  Treatment  Of  Polymorphism 

The  call  to  getter  in  main  is  treated  polymorphically;  the  caller’s  parameter  and  result 
types  are  required  to  be  some  polymorphic  instance  of  the  callee’s  types.  On  the  other  hand 
the  call  to  getX  from  getter  is  not  treated  polymorphically;  the  caller  and  callee  types 
must  be  identical. 

The  technical  reason  for  this  distinction  is  that  we  can  only  polymorphically  generalize  type 
variables  that  are  not  bound  in  the  current  type  environment.  All  the  type  variables  in  the 
type  assigned  to  getter  are  polymorphically  generalized,  because  they  do  not  occur 
anywhere  outside  the  definition  of  getter.  (Intuitively,  this  means  that  the  assignment  of 
types  to  these  variables  is  independent  of  anything  outside  getter,  and  therefore  different 
types  can  be  chosen  for  each  use  of  getter.)  On  the  other  hand,  in  getter,  the  type 
variables  in  the  type  of  the  callee  o  .  ge  tX  are  bound  in  the  type  environment;  in  particular 
they  occur  inside  get  ter ’s  parameter  type.  (Intuitively,  this  means  that  the  assignment  of 
types  to  these  type  variables  is  constrained  by  the  caller  of  get  ter.  For  example,  the  caller 
of  getter  might  pass  in  an  object  whose  getX  method  always  returns  an  integer. 
Obviously  it  would  be  unsafe  to  allow  getter  to  choose  different  return  types  for  each 
call  to  getX.) 

6.3.5.5  Polymorphism  In  Object  Creation 

When  an  object  is  created,  such  as  when  obj  is  created  in  main,  its  field  and  method  slots 
are  always  iniitalized  with  constant  values  —  either  zero  scalar  values,  or  the  functions  that 
implement  the  methods  supported  by  the  object.  The  usage  of  these  constant  values  is 
always  treated  polymorphically.  Therefore  if  a  method  implementation  is  inherited  into 
multiple  classes,  which  are  instantiated  at  multiple  sites,  the  references  to  the  method 
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implementation  at  each  site  can  be  given  distinct  types.  Similarly,  fields  of  objects  of  the 
same  class  created  at  different  sites  can  be  given  distinct  types. 

6.3.6  Extensible  Records  and  Object  Classes 

Consider  the  code  in  Figure  6-5.  This  example  demonstrates  the  use  of  subclass  polymor¬ 
phism  with  subclasses  having  distinct  fields. 


class  SuperObj  { 

abstract  int  getX ( SuperObj  this); 

} 

class  MyObj  { 
int  fieldX; 

int  getX (MyObj  this)  { 

load  this;  getfield  MyObj_f ieldX;  ireturn; 

} 

} 

class  YourObj  { 
int  otherX; 

int  getX (YourObj  this)  { 

load  this;  getfield  YourObj_otherX;  ireturn; 

} 

} 

static  int  getter ( SuperObj  obj )  { 

load  obj;  invokevirtual  getX;  ireturn; 

} 

static  int  main ( )  { 

if  ...  then  new  MyObj  else  new  YourObj; 
invokevirtual  getX;  ireturn; 

} 

Figure  6-5.  Extensible  Record  Example 

The  following  types  are  inferred: 

MyObj _getX:  Va,  e,  p.  ({  MyObj_fieldX:  a\  p  })  — »  (a,  e ) 

YourObj _getX:V6,  e,  p.  ({  YourObj_otherX:  Z>;  })  — >  ( b ,  e) 

getter:  Vc,  e,  p.  ( t )  — »  (c,  e)  where  t  =  {  getX:  (t)  — »  (c,  e);  p  } 

object  in  main:  u  where  u  =  {  getX:  (//)  (c,  e );  MyObj_fieldX:  c; 

YourObj_otherX:  c;  p  }  (for  some  c,  e,  p) 

In  general,  if  Java  declares  a  variable  to  be  of  class  C  (here,  SuperObj ),  then  any  fields 
and  methods  belonging  to  C  or  any  subclass  of  C  (here,  MyObj  and  YourObj )  can  appear 
in  the  type  inferred  for  the  variable.  This  can  lead  to  the  slightly  counterintuitive  situation 
where  variables  having  the  least  constraining  Java  types  (e.g.,  variables  of  type  Obj  ect) 
have  the  most  complex  inferred  types. 
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6.3.7  Mutability 

Global  variables  and  fields  of  objects  are  mutable.  However,  in  the  type  system  I  have  not 
distinguished  mutable  and  immutable  slots  of  records.  The  distinction  is  irrelevant  because 
whenever  a  slot  of  a  record  is  accessed,  the  record  has  a  monomorphic  type  and  therefore 
the  type  of  the  slot  is  monomorphic.  Thus  two  accesses  to  the  same  slot  of  a  record,  whether 
reads  or  writes,  always  get  the  same  type  for  the  slot.  (The  fatal  error  would  be  to  treat  a 
mutable  slot  of  a  record  as  polymorphic;  we  might  store  a  value  in  the  slot  with  one  type, 
retrieve  the  value  with  another  type,  and  thus  destroy  soundness.) 

6.3.8  Control  Flow 

Internally,  a  Java  bytecode  method  is  simply  an  array  of  bytecode  instructions  with 
arbitrary  control  flow  between  them.  SEMI  treats  each  bytecode  instruction  as  a  local 
function  which  takes  the  values  of  the  current  working  stack  and  local  variables  as  param¬ 
eters,  and  calls  the  successor  instruction(s)  as  tail  calls.  Each  local  function  returns  the  final 
result  of  the  method  and  its  thrown  exception. 

The  stack  is  passed  as  a  list,  so  that  “push”  operations  become  “cons”  and  “pop”  operations 
become  “head/tail”.  Local  variables  are  passed  in  a  record. 

A  method  executes  by  calling  the  local  function  for  the  first  instruction,  with  method 
parameters  placed  into  local  variables  (as  required  by  the  Java  bytecode  semantics). 

For  example,  the  method 

int  add3(int  x)  {  load  x;  bipush  3;  iadd;  return;  } 

translates  to 

fun  add3 (this,  x)  = 

let  fun  f_0(st,  (vO,  vl))  = 

f_l  (vO  :  :  st,  (vO,  vl )  ) 

and  fun  f_l(st,  (vO,  vl))  =  f_2(3::st,  (vO,  vl)) 

and  fun  f_2 (a : : b : : st ,  (vO,  vl))  =  f_3 ( (a+b) : : st,  (vO,  vl)) 

and  fun  f_3(v::st,  (vO,  vl))  =  (v,  ...) 

in  f _ 0 ( [ ] ,  {#0:  this;  #1:  x}) 

The  encoding  is  simple  and  regular. 

All  kinds  of  control  flow  are  easily  handled.  The  method 

static  int  isequal(int  x,  int  y)  { 

0:  load  0;  1:  load  1;  2:  if_cmpeq  6; 

3:  bipush  0;  4:  store  2;  5:  goto  8; 

6:  bipush  1;  7:  store  2; 

8:  load  2;  9:  return;  } 

translates  to 
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fun  isequal (x,  y)  = 


let 

fun 

f 

0  ( 

st,  (vO,  vl,  v2 ) 

)  =  f  1 (vO : 

:  :  st , 

(vO, 

vl ,  v2 

and 

fun 

f 

1  ( 

st,  (vO,  vl,  v2 ) 

)  =  f  2 (vl : 

:  :  st , 

(vO, 

vl ,  v2 

and 

fun 

f 

_2  ( 

vl : : vO : : st ,  Is ) 

= 

if 

r  vl 

= 

vC 

i  then  f  6  ( st ,  Is 

)  else  f  3 

(st. 

Is) 

and 

fun 

f 

3  ( 

st.  Is)  =  f_4 (0: 

: st ,  Is ) 

and 

fun 

f 

4  ( 

a : : st ,  ( vO ,  vl , 

v2 )  )  =  f  5 

(st. 

(vO, 

vl,  a) ) 

and 

fun 

f 

5  ( 

st ,  Is )  =  f  8 ( st 

,  Is) 

and 

fun 

f 

6  ( 

st.  Is)  =  f  7  (1 : 

: st ,  Is ) 

and 

fun 

f 

7  (b : : st ,  ( vO ,  vl , 

v2 )  )  =  f  8 

(st. 

(vO, 

vl,  b) ) 

and 

fun 

f 

8  ( 

st,  (vO,  vl,  v2 ) 

)  =  f  9(v2: 

:  :  st , 

(vO, 

vl ,  v2 

and 

fun 

f 

9  ( 

v:  :  st)  =  (v,  ...) 

These  calls  between  instructions  could  be  treated  polymorphically.  In  theory  some 
accuracy  might  be  gained  because  at  control  flow  merge  points,  the  state  along  each 
incoming  control  flow  edge  could  be  given  a  different  type,  each  an  instance  of  the  type  of 
the  state  at  the  destination  instruction.  In  practice  this  increased  accuracy  has  not  proved 
useful,  and  even  with  some  obvious  optimizations  (e.g.,  only  allow  polymorphism  for  calls 
to  instructions  representing  control  flow  merge  points),  it  has  proved  prohibitively 
expensive.  Therefore  in  practice  SEMI  treats  these  transfers  monomorphically  (making  the 
types  of  the  actual  parameters  and  results  equal  to  the  types  of  the  formal  parameters  and 
results,  rather  than  instances  of  those  types).  However,  in  the  description  below,  I  use 
polymorphic  constraints  for  instruction  transfers  to  show  that  they  are  sound. 

However,  even  under  monomorphism  it  is  still  the  case  that  a  stack  location  or  local 
variable  can  be  given  different  types  at  different  program  points.  For  example,  local 
variable  #2  is  has  a  different  type  after  it  is  assigned  to  the  type  it  had  before  assignment. 
This  has  the  same  effect  as  translating  the  program  into  Single  Static  Assignment  form 
before  performing  the  analysis,  but  it  arises  naturally  from  the  encoding. 

6.3.9  Exception  Handling 

Exception  handling  is  performed  in  a  way  similar  to  other  control  transfers.  In  each  method, 
every  instruction  which  might  throw  an  exception,  or  receive  a  propagated  exception 
(which  is  actually  all  instructions,  because  the  virtual  machine  can  throw  an  “internal  error” 
exception  at  any  instruction),  can  transfer  control  to  any  applicable  exception  handlers 
defined  in  the  method.  The  translation  does  not  specify  when  an  exception  is  thrown;  for  a 
given  instruction,  the  choice  of  whether  to  throw  an  exception  or  continue  normal  execution 
is  always  considered  to  be  nondeterministic  (unless  the  instruction  is  an  unconditional 
athrow  instruction).  Control  transfer  to  an  exception  handler  puts  the  current  exception 
object  onto  the  top  of  the  working  stack,  as  specified  by  the  Java  bytecode  semantics. 

Most  methods  do  not  have  any  explicit  exception  handlers.  However,  all  methods  must  be 
able  to  propagate  thrown  exceptions  to  the  caller.  Each  instruction  which  can  throw  an 
exception  (or  receive  a  propagated  exception)  can  nondeterministically  choose  to  return  the 
exception  value  immediately  as  the  method  result,  thus  propagating  the  exception.  The 
following  code  shows  an  example  of  such  behavior: 
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fun  callAll ( )  = 

let  (resultl,  exnl)  =  calll() 
in  if  ?  then  (...,  exnl)  else 

let  (result2,  exn2 )  =  call2 ( result 1 ) 
in  if  ?  then  (...,  exn2 )  else 
(result2,  ...) 

6.4  Initial  Constraint  Set 

Consider  a  program  P  in  the  Micro  Java  Bytecode  language,  as  defined  in  Section  3.2.2. 

6.4.1  Constraint  Variables 

The  set  of  initial  constraints  for  P  makes  use  of  the  following  variables: 

•  S pc:  the  variable  for  the  working  stack  on  entry  to  instruction  pc 

The  stack  is  a  list,  so  its  variable  can  have  two  components:  “head”,  representing  the 
top  of  the  stack,  and  “tail”,  representing  the  rest  of  the  stack. 

•  L pc:  the  variable  for  the  local  variable  file  on  entry  to  instruction  pc 

The  local  variables  are  indexed  by  number,  so  Lpc  has  numbered  components,  one  for 
each  local  variable  used. 

•  Xpc:  the  variable  for  the  exception  thrown  by  the  code  starting  at  pc 

•  G pc:  the  variable  for  the  global  variables  on  entry  to  instruction  pc 
This  variable  has  one  component  for  each  static  field  in  the  program. 

•  RpC:  the  variable  for  the  value  that  the  code  at  pc  eventually  returns  from  the  method 

•  S'pc,  L ’pc\  the  variables  for  the  state  on  leaving  instruction  pc 

•  N cjassjD'.  the  variable  representing  the  prototypical  object  of  class  classID 

•  M methodlmpf-  the  variable  representing  the  type  of  the  method  methodlmpl 

•  Tpcjabef-  variables  used  by  the  instruction  at  pc  for  internal  purposes 

•  hi classID,, methodlD-  the  variable  representing  the  type  of  inherited  method  methodID  in 
class  classID 

•  ^ class!/)  fieldID-  the  variable  representing  the  type  of  field  fieldID  in  class  classID 

•  N fieidj]j'.  the  variable  representing  the  type  of  static  field  fieldID 

•  Err:  the  variable  representing  the  exceptions  which  may  be  thrown  spontaneously  by 
the  virtual  machine 

•  S'exn -pc-classlD-  these  variables  represent  the  new  stack  on  transfer  to  an  exception  han¬ 
dler  when  exception  classID  is  thrown  at  pc 

6.4.2  Instance  Labels 

SEMI  uses  the  following  instance  labels: 
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•  pc-pc ':  an  instance  representing  the  use  of  (transfer  of  control  to)  one  instruction  from 
another. 

SEMI  treats  each  instruction  as  a  function;  transferring  control  from  one  instruction  to 
another  corresponds  a  call  to  the  destination  instruction’s  function,  passing  in  the  cur¬ 
rent  local  variables,  working  stack  elements  and  global  variables  as  parameters.  These 
“functions”  do  not  return  until  the  entire  method  returns;  the  returned  value  is  the  result 
of  the  method.  The  functions  are  treated  as  polymorphic,  so  different  information  can 
be  inferred  for  an  instruction  for  each  incoming  control  path. 

•  pc:  an  instance  representing  the  use  of  a  static  method  (when  pc  corresponds  to  an 
invokestatic  instruction)  or  the  creation  of  a  new  object  (when pc  corresponds  to 
a  new  instruction). 

A  method  can  be  thought  of  as  a  polymorphic  function.  Note  that  global  variables  are 
treated  as  the  fields  of  a  “globals  object”  which  is  passed  as  a  parameter  to  every  such 
function,  so  every  such  function  is  self-contained  and  has  no  references  to  any  environ¬ 
ment.  A  static  call  to  a  method  is  a  direct  invocation  of  the  function,  and  so  gets  a  new 
polymorphic  instance.  Creation  of  an  object  can  be  thought  of  as  cloning  a  prototype 
object,  and  also  gets  a  new  polymorphic  instance. 

•  classID-methodID:  an  instance  representing  the  inheritance  of  a  method  implementa¬ 
tion  by  a  class. 

Each  prototype  object  for  a  class  can  be  thought  of  as  a  record,  with  one  slot  for  each 
signature  of  the  methods  implemented  by  the  class.  The  putative  definition  of  the  proto¬ 
type  assigns  the  function  associated  with  each  inherited  method  implementation  to  the 
slot  for  its  signature.  Since  one  method  implementation  can  be  inherited  into  multiple 
classes,  each  class  which  uses  a  method  implementation  gets  a  new  polymorphic 
instance  of  the  method. 

•  err  -pc:  an  instance  representing  the  creation  of  a  spontaneously  thrown  exception  at  a 
particular  program  point. 

This  is  similar  to  the  instance  induced  when  an  object  is  created  by  new. 

•  err -classID:  an  instance  representing  the  creation  of  a  new  object  when  a  spontaneous 
exception  is  thrown. 

A  spontaneous  exception  creates  an  object  which  has  one  of  many  possible  classes.  The 
variable  “Err”  represents  the  type  of  an  object  which  could  be  any  one  of  these  classes, 
and  therefore  “Err”  is  an  instance  of  the  object  prototype  for  each  spontaneous  excep¬ 
tion  class.  Each  of  these  instances  needs  a  different  label,  err  -classID. 

6.4.3  Component  Labels 

I  make  use  of  the  following  component  labels: 

•  param-/:  a  parameter  to  a  method. 

•  globals:  the  global  variables  passed  into  a  method. 

•  result:  the  result  returned  by  a  method. 

•  exn:  the  exception  thrown  by  a  method  (essentially,  an  alternative  result). 

•  i:  a  local  variable  index. 
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•  fieldID :  a  field  slot  of  an  object. 

•  methodID :  a  method  slot  of  an  object. 

•  head:  the  head  element  of  a  stack,  treated  as  a  list. 

•  tail:  the  tail  of  a  stack. 


6.4.4  Program  Constraints 

The  set  of  initial  constraints  assigned  to  an  MJBC  program  is  given  as 
InitialConstraints(P)  = 

(gj  {  IConstraints(pc)  I  pc  g  dom  Instruction  }) 
d  (gj  {  M I n voc at i o n (methodlmpl )  I  ( ‘methodlmpl ,  0)  g  dom 

Instruction  }) 

d  (gj  {  MDispatch(c/rm7D,  methodID )  I  ( classID , 

methodID)  g  dom  Dispatch  }) 

u  (gj  {  EFields(c/rm7D)  I  classID  g  dom  InitFields  }) 

u  (gj  {  CatchConstraints(pc,  classID)  I  (pc,  classID)  g  dom 

CatchBlockOffset  }) 

u  (u  (  (  G(Main,  0)  > fieldID  ^fieldID  )  1  fieldID  G 

dom  InitStaticFields  }) 

u  (gj  {  {  Err  ^err-/)c  }  I  pc  g  dom  Instruction  }) 

{  {  ^c/av.v/ 1)  "^err -classID  '  ^  classID  G 

ErrorClassIDs  }) 

This  definition  uses  several  functions: 

•  lConstraints(pc)  is  a  partial  function  that  assigns  to  each  pc  the  initial  constraints 
induced  by  the  instruction  at  pc.  IConstraints  is  defined  by  the  rules  in  Table  6-1. 

•  MInvocation  computes  the  constraints  needed  to  hook  up  the  type  of  a  method  body  m 
to  the  types  at  the  method  definition. 


Mlnvocation(w)  =  {M„,  >param.0  T„i  p0,MJW  >paran>1  T;w  pl,M;w  >globals  G(„,0)] 

^  ^exn  0)’  ^/»  ^result  ^(;w,  0)’  ^(ni,  0)  ^0  ^  tn,  pO  ^(ni,  0)  ^1  ^ m,  pi  ^ 


•  MDispatch  computes  the  constraints  needed  to  implant  the  type  of  the  method  imple¬ 
mentation  methodID  into  the  type  of  the  prototype  object  for  class  classID. 

MDispatch(c/rm7D,  methodID)  = 

{  methodID )  ^ classID-methodID  ^classID, methodID’ 

N classID  ^ methodID  ^  classID  .methodID  } 

•  IFielcls  computes  constraints  ensuring  that  every  object  field  has  a  type. 
IFields(c/rm7D)  = 

{  Nc/«v.v//j  >fieldlD^  classID  fieldID  \  fieldID  g  dom  InitFields(t  Am//))  } 
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Instruction(pc) 

IConstraints(pc) 

aconst  null 

{  S  pC  !>tail  Spc,  SpC+i  I>head  TpC,v  ) 

u  Succ(/t6',  pc+ 1 ,  Lpc) 

bipush  byte 

{  S  pc  ^tail  Spc->  Spc+1  ^head  TpC,v  ) 
u  Succ(pc,/?c+l,  S  pC,  hpc) 

iadd 

{  Spc  ^tail  Tp(;,t 1  ’  ^ pc, \\  ^tail  TpC, t2’  S  pc  ^tail  ^ pc\2- 
Spc+1  ^head  T pc,v  } 

u  Succ(pc,/?c+l,  SpC,  L pc) 

load  index 

{  Lpc  ^ index  Tpc,V’  S  pc  ^tail  Spc,  S  pc  ^head  TpC,v  ) 

u  Succ(pc,/?c+l,  S'pc,  Lpc) 

store  index 

{  ^pc  ^tail  S  pC ,  S pc  !>head  ^  pc  ^ index  ^ pc,v  }  ^ 

{  L ' pc  \>i  Tpc  ,1/e  LocalNames(pc)  a  /  A  index  }  u 
{  hpc  \>j  Tpc  ,l/o  LocalNames(pc)  a  /  A  index  }  oj 
Succ(//6',  pc+ 1 ,  S'pc,  L'pc) 

if  cmp e q  offset 

{  SpC  t>  tai  1  S  pc  ! 

Succ(pc,pc+1,  S  pC,  L pc,  Gpc,  Xpc,  RpC) 
u  Succ(pc,  (CodeLocMethod(pc),  offset),  S'pc,  Lpc) 

goto  offset 

Succ (pc,  (CodeLocMethod(pc),  offset),  Spc,  Lpc) 

return 

{  Spc  ^head  ^pc  ) 

new  classID 

{  S  pc  I>tail  Spc,  SpC+i  [>head  TpC,y,  N classID  fpc  TpC,v  } 

u  Succ(pc,/?c+l,  S'pc,  L pcf 

getfield  fieldID 

{  Spc  I>tail  Tpc, t’  Spc  I>head  TpC,obj’  TpC,obj  ^ fieldID  T po,V’ 

S  pc  ^head  Tpc,v’  S  pc  ^tail  TpC,t  } 

u  Succ(pc,/?c+l,  SpC,  L/J(;) 

putfield  fieldID 

{  Spc  ^tail  Tpc,t’  Spc  ^head  TpC,v’  TpC,t  ^tail  S  pc 

TpC,t  ^head  TpC,obj’  TpC,obj  ^ fieldID  T pc,v  ) 

u  Succ(pc,/?c+l,  SpC,  LpC) 

gets  tat  ic  fieldID 

{  T>pc  C> fieldID  Tpc,v,  S  pc  ^tail  SpC,  S  pc  ^head  TpC,v  ) 

u  Succ(pc,/?c+l,  Spc,  LpC) 

putstatic  fieldID 

{  Spc  I>tail  S  pc->  Spc  I>head  TpC,v,  T>pC  ^fieldID  Tpc,v  ) 

u  Succ(pc,/?c+l,  Spc,  LpC) 

invoke virtual 

methodlD 

{  Spc  ^tail  Tpc,tl’  Spc  I>head  TpC,vl’  TpC)t  1  ^tail  TpC,t2’ 

Tpc,t  1  ^head  TpC,vO  TpC,vO  ^ methodlD  Tpc, m, 

S  pc  ^tail  Tpc,t2’  S  pC  ^head  Tpc,r  ) 

Me thodC all(  Tpc  m,  TpC;Vo,  TpC,v  1  ,  ^ pc-  ^-pc>  Tpc,r) 
u  Succ(pc,/>c+l,  Spc,  LpC) 

Table  6-1.  Instruction  Constraints 
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Instruction(pc) 

IConstraints(pc) 

invokestatic 

methocllmpl 

{  Spc  ^tail  Vtl,  S pC  I>head  ^ pc, w  1  ’  ^pc, tl  ^tail  T/>c,t2’ 

T/>c,t  1  ^head  ^-pcp/ty  ^methodlmpl  ^ pc  ^-pcpni 

S  pc  ^tail  T/>c,t2’  S  pC  I>head  Tpc, r  } 

oj  MethodCall(T^c  m,  Tpc  vq,  T^(;  v  | .  Gpc,  Xpc,  T pCp) 
oj  Succ(pc,/>c+l,  S'pc,  Lpc) 

checkcast  classID 

Sxxccipc,  pc+\,  Spc,  Lpc) 

instanceof  classID 

{  Spc  ^tail  S  pc  !>tail  ^pc, t’  SpC+ 1  ^head  ^/>c,v  } 

oj  Succ(pc,/>c+l,  S'pc,  Lpc) 

athrow 

{  Spc  I>head  ^-pc  ) 

Table  6-1.  Instruction  Constraints 


a.  The  object’s  type  variable  is  plugged  into  S^c+1  instead  of  S'pc,  because  for  the 
proofs,  we  need  the  field  and  method  components  of  the  variable  to  appear  at  S^c+1. 

The  implementation  instead  has  S'pc  >head  T pcY-  The  discrepancy  can  probably  be 
corrected  by  adding  “post-state”  expressions  to  the  expression  syntax  and  extend¬ 
ing  the  soundness  proof  to  cover  them. 

•  CatchConstraints  gives  constraints  capturing  the  control  flow  for  exceptions  of  class 
classID  thrown  at  pc  and  caught  in  the  method. 

CatchConstraints(/?c,  classID)  = 

Succ(/;6',  (CodeLocMethod(pc),  CatchBlockOffset(/?e,  classID)),  S '  eXn-Pc-classlD^  L pc) 

^  {  S  exn -pc-classID  ^head  ^-pc  } 

The  last  three  sets  of  constraints  are: 

•  Constraints  ensuring  that  every  static  field  has  a  type. 

•  Constraints  expressing  the  possibility  that  an  exception  may  be  spontaneously  thrown 
from  at  any  instruction. 

•  Constraints  specifying  that  the  spontaneously  thrown  exceptions  are  objects  of  the 
classes  found  in  the  ErrorClassIDs. 

The  rules  in  Table  6-1  use  the  following  functions: 

•  LocalNames  computes  the  indices  of  the  local  variables  used  in  method.  LocalNames  is 
used  to  make  sure  the  values  of  all  local  variables  are  carried  forward  correctly  when 
one  of  them  is  overwritten  by  a  store  instruction. 

LocalNames  (method)  = 

{index  |  3/.  Instruction  {method,  i)  e  {load  index,  store  index}} 

•  Succ  computes  the  constraints  that  arise  along  control  flow  paths  within  a  method, 
when  one  instruction  is  a  successor  of  another  in  the  control  flow  graph.  Succ  treats  the 
transfer  of  control  from  one  instruction  to  the  next  as  if  it  were  a  function  call,  so  that 
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the  instruction  at  from  performs  a  “tail  call”  to  the  instruction  at  to  to  do  the  rest  of  the 
computation  for  the  current  method.  S  and  L  are  the  types  for  the  working  stack  and  the 
local  variables  respectively  that  are  passed  into  to. 


Succ (from,  to,  S,L)  = 

{S?0  from-to  ^to  ^ from-to 

{Rto  * 


R  1 

'from-to  from  l 


L  G  <r  Gr  X  <r  Xr 

’  to  from-to  from ’  to  from-to  from 


•  MethodCall  computes  the  constraints  needed  to  hook  up  a  method  call  at  a  call  site.  M 
is  the  type  for  the  method  being  called.  PO  and  PI  are  the  types  of  the  parameters  being 
passed  in.  G  is  the  type  of  the  globals  object  being  passed  in.  X and  R  are  the  types  of 
the  exception  and  normal  result  returned,  respectively. 

MethodCall(M,  PO,  PI,  G,  X,  R )  = 

iM>  param-0  M  >param-l  ^M>globals  M  >exn  ^ M  >result  R  > 


6.4.5  Query  Constraints 

Additional  constraints  must  be  added  to  the  set  C  to  support  queries  over  arbitrary  bytecode 
expressions.  These  constraints  depend  on  the  queried  expressions,  and  are  detailed  below 
in  Section  6. 5. 3. 2. 


6.4.6  Canonical  Constraint  Set 

C  is  a  canonical  constraint  set  if 
V?/,  v.  {  it  =  v}  c:  C  =>  it  =  v . 

Given  a  closed  constraint  set  N, 

Lemma  6-1.  Let  a  closed  constraint  set  Abe  given.  Let  M be  a  map  from  variables  to 
variables  such  that 

V?/,  v.  { it  =  v }  cz  N  v  it  =  r  <  >  \  t(jt)  =  M(v) 

M  selects  one  representative  element  from  each  equivalence  class.  Such  a  map  exists  for 
any  choice  of  N,  because  the  closure  of  N implies  the  relation  =  is  an  equivalence  relation 
in  N  (lacking  only  reflexivity,  which  I  restore  with  the  disjunction). 

Let  C  be  defined  as: 

C  =  {MU)  >c  Ml  it)  |  {t  >c  it  j  ciV} 
u  {MU)  zfj  MU  t)  {tzf ;  it }  cz  N } 
u  { M(t )  =  M(t)  1 1  g  dom  M} 

(C  replaces  each  variable  in  N  with  the  representative  of  its  equivalence  class.)  Then  C  is 
trivially  canonical.  Furthermore,  C  is  closed. 

Proof:  I  prove  the  closure  condition  that  {t  >c  it,  t  \>c  v}  <z  C  implies  {?/  =  v}  <z  C . 
Suppose  {t  >c  it,  t  >c  v}  <z  C .  Then 
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3  f,  1",  u',  v'.t  =  M(f)  a  II  =  Min')  a  t  =  M{1")  a  V  =  M(v')  where 
{f  >c  ll\  I"  >c  C  }  A  N 

By  definition  of  M, 

Mil’)  =  M(t”)  =>  {f  =  t"  }  c  N  v  f  =  t" 

In  either  case  of  the  disjunction, 

{!'  >c  u',  t'  >c  v'}  e N 
By  closure  of  N, 

{u'  =  v'}  e  N 
This  gives 

Min')  =  M{v') ,  i.e.,  ii  =  v  and  therefore  {u  =  v}  e  C 
The  other  closure  conditions  follow  similarly.  ■ 

The  remainder  of  this  chapter  deals  with  canonical  closed  constraint  sets.  This  eliminates 
the  need  to  explicitly  deal  with  equivalence  constraints. 

6.4.7  Example 

The  Java  code  in  Figure  6-6  would  generate  bytecode  as  shown  in  Table  6-2. 


class  X  { 

X  f (X  a)  {  return  this;  } 

static  X  g (X  c,  X  d)  {  return  c.f(d);  } 

static  X  main (X  b)  {  return  g (new  X(),  b) ;  } 

} 

Figure  6-6.  A  Simple  Java  Program 

For  this  program,  one  might  ask  “can  main’s  result  equal  the  new  X  object  it  creates?”  We 
shall  see  how  this  question  is  answered  by  computing  initial  constraints  (shown  in  Table  6- 
2)  and  then  finding  a  closed  form. 

6.4.7. 1  Initial  Constraints 

The  constraints  shown  in  Table  6-2  have  been  simplified  from  the  real  constraints  in  order 
to  make  the  example  simultaneously  tractable  and  interesting.  In  particular,  all  the 
“successor  instance”  constraints  have  been  replaced  with  equalities,  which  have  then  been 
eliminated  by  substitution.  All  of  the  constraints  within  methods  relating  to  the  stack  (S) 
and  local  variable  (L)  variables  have  been  solved  and  eliminated.  All  constraints  relating  to 
global  variables  and  exceptions  are  irrelevant  and  have  been  elided. 

6.4.7.2  Finding  a  Closed  Form 

SEMI  would  close  the  constraint  set  by  generating  additional  constraints,  as  follows: 

The  equality  constraints  within  f  give 

{  Mp  [>resuit  TfjPo  }  • 
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Bytecode 

class  X  { 

f ( this ,  pi )  { 


0:  load  this; 

I-  return; 

} 

static  g(p0, 
pl)  { 

0:  load  pO 

l:  load  pl 

2:  invokevirtual  f; 

3:  return; 

} 

static 


main (pO )  { 


0  : 

new  X; 

1 : 

load  pO; 

invokestatic 

2:  g; 

3  : 

return; 

} 

} 


Induced  Initial  Constraints 

Mf  <X-fNx,f 

Mf  I>param-0  T^pO 
Mf  I>result  %,0) 

%,0)  =  Tf,po 

Mg  I>param-0  TgjjO 
Mg  I> result  R(g;0) 

Tg,p0  T(g,2),m 
"^(g^im  ^param-1  Tgpj 
T(g,2),m  ^result  R(g,0) 

Mmain  ^param-0  Tmain,p0 

CX  ^(main,0)  ^(mainh)^ 

Mg  ^(main,2)  r^(main,2),m 
T(main,2),m  ^param-1  Tmain,p0 
Tpnaindim  ^result  R(main,0) 


Cx  »fNx,f 

Mf  >param-l  Tf,pl 


Mg  I>param-1  Tgpj 


T(g,2),m  ^param-0  ^gspO 


Mmain  ^result  R(main,0) 


Tpnaindim  ^param-0  r^(main,0),v 


Table  6-2.  A  Simple  Bytecode  Program  and  its  Constraints 


We  propagate  components  of  Mf  to  Nx,f  (using  Mf  <x_f  Nx  fX  getting 
{Nx 

,f  ^param-0  X  Nx  f  >result  v  }  (for  some  v  where  Tf  p0  <x  f  v). 

Now  we  propagate  Nx,f  and  its  components  to  the  instance  of  Cx  in  main  (using 

CX  ^(main,0)  T(main,0),v)’  yielding 

{  T(main,0),v  s  ’  s  ^param-0  v  ’  s  h re suit  V  }  (for  some  s  and  v  where  Nx,f  <(main  0)  s 
and  V  ^(main,0)  v  )• 

In  other  words,  we  know  in  main  that  the  object’s  f  method  aliases  its  first  parameter  and 
result.  Now  we  need  to  work  on  g.  The  constraints  for  g  contain  {  Tg  p0  >f  T(g  2)  m> 
T(g,2),m  >param-0  Tg,p0,  Tcg,2),m  >result  R(g,0)  }•  So  inside  g,  we  know  that  we  pass  pO  into 
pO’s  f  method,  and  the  result  of  that  method  is  returned  from  g.  We  do  not  assume 
anything  else  about  f  here. 

We  propagate  g’s  constraints  to  main,  obtaining 

{  Tg;P0  ^(main,2)  T(main,0),v ,  R(g,0)  ^(main,2)  R(main,0)  ) 

From  here  we  get 


115 


{  T(main,0),v  U’  u  ^param-0  T(mainO),v  ,  u  ^result  ^(main,0)  )  (^or  some  u  where 
T(g,2),m  ^(main,2)  u)- 

Now  T(mam  0)  v  >f  u  and  T(main  0)  v  >f  s  require  us  to  set 

{  ii  =  5  } 

In  other  words,  we  have  “discovered”  the  implementation  of  f  that  g  uses. 

From  the  param-0  components  of  u  and  s,  we  get 

{  V'  =  T(main,0),v  ’  V'  =  ^(main.O)  ) 

Thus 

{  ^(main.O)  =  ^(main.O)^  ) ' 

Because  the  result  of  new  X  in  main  is  assigned  type  T(main  0)  v ,  the  conclusion  is  that  the 
result  of  main  may  be  the  new  X. 

6.5  Extracting  the  VPR  Approximation 

In  this  section,  I  consider  a  canonical  closed  constraint  set  C,  with  associated  map  M 
mapping  from  the  original  variables  to  the  variables  of  C,  and  a  pair  of  bytecode  expres¬ 
sions  cj  and  e2,  and  show  how  SEMI  decides  whether  cj  and  e2  are  related  in  the  VPR 
approximation. 

6.5.1  Overview 

Below,  I  define  a  judgement  ( e ,  x)  — »  (it,  x')  that  relates  a  bytecode  expression  e  in  some 
context  x  to  a  SEMI  variable  u  with  some  “leftover  context”  x' ,  which  is  a  suffix  of  x .  A 
context  is  a  sequence  of  instance  labels.  For  first-order  code,  it  corresponds  to  a  call  stack, 
each  label  naming  a  method  call  site  or  an  instruction  transition  (recall  that  instruction 
transitions  are  treated  as  tail  calls). 

The  SEMI  variable  u  is  referred  to  as  the  ground  type  of  the  expression  in  the  context.  A 
ground  type  is  obtained  by  first  ignoring  the  context  and  computing  the  base  type  1  assigned 
to  the  expression  by  SEMI,  for  example,  the  type  variable  assigned  to  a  local  variable.  Then 
we  follow  the  chain  of  instances  starting  at  t  and  labelled  by  the  instance  labels  in  the 
context  x  as  far  as  possible,  to  obtain  u,  the  “most  specific”  instance  of  t  in  context  x . 

The  “leftover  context”  x'  is  the  suffix  of  x  that  was  not  dereferenced;  it  represents  the 
outermost  context  at  which  some  instance  of  t  appears.  For  example,  when  u  occurs  as  part 
of  the  type  of  a  global  variable,  the  leftover  context  is  empty  because  an  instance  of  u  will 
occur  at  the  top  level. 

The  analysis  concludes  e^  o-  e2  if  and  only  if 

3u,  Xj,  x9,  Xj',  x{.  (Cj,  Xj)  — »  (//,  Xj')  a  (e?,  x9)  — »  (u,  x2') 

The  idea  is  that  u  is  the  type  of  a  witness  value  that  causes  Cj  and  e2  to  be  related.  The 
expressions  are  related  if  there  is  some  plausible  type  u  that  is  an  instance  (in  any  contexts) 
of  both  of  the  base  types  of  cj  and  e2. 
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6.5.2  Relating  Bytecode  Expressions  to  Variables 

The  inference  rules  in  Figures  6-7,  6-8  and  6-9  define  judgements  of  the  form  “e  — »  u  { c >  ” 
(the  “expression  decomposition”  relation),  “//(c)  — »  //'”  (the  “component  evaluation” 
relation),  and  “(//',  x)  — »  (v,  x')”  (the  “instance  evaluation”  relation).  These  judgements 
are  combined  in  Figure  6-10  to  form  the  judgement  “(e,  x)  — »  (v,  x')”.  In  this  section  I 
prove  a  number  of  simple  structural  properties  of  these  relations. 


pc : 

stack-0  — >  S„„(head 

pc 

::  s> 

pc  :  exn  — >  X^c  (s> 

pc : 

stack-(«  -  1)  — »  Spc  (c') 

//  >  0 

pc : 

stack-?/  — »  S„„(tail : 

pc 

:  c') 

pc 

:  local-//  — >  L,„(//  :: 

pc 

s> 

pc 

:  staticFielcl  — »  G  (staticField ::  s> 

pc 

pc : 

exp  — »  u(c]  ::  ...  ::  ck  : 

:  s> 

pc  :  exp  .field  — »  u(c  l  ::  ...  ::  ck  ::  field ::  s) 


Figure  6-7.  Rules  defining  the  mapping  from  bytecode  expressions  to  constraint  variables  and  components 


The  expression  decomposition  relation  maps  a  bytecode  expression  e  to  a  representation  of 
its  base  type,  given  as  a  basic  type  variable  u  (one  of  S^c,  Gpc ,  Xpc,  or  Lpc,  for  some  pc), 
and  a  sequence  of  component  labels  c  that  must  be  followed  from  u  to  reach  the  base  type 
for  e.  The  component  evaluation  relation  then  takes  u  and  dereferences  the  chain  of 
component  labels  to  reach  a  variable  //'  corresponding  to  the  actual  base  type  of  e.  Finally 
the  instance  evaluation  relation  finds  the  most  specific  instance  of  u'  in  context  x' . 

The  rest  of  this  subsection  proves  several  formal  properties  of  these  evaluation  relations. 
Many  of  them  are  generalizations  of  the  closure  properties  of  constraint  sets. 
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e  — »  u(c)  M(u)  { c )  — »  u'  (//',  x)  -a  (v,  x') 

(e,  x)  ->  (v,  x') 

Figure  6-10.  Rule  assigning  a  ground  variable  to  an  expression  in  a  given  context 

Lemma  6-2.  Existence  property.  Instance  evaluation  is  total: 

V//,  x.  3v,  x'.  (it,  x)  -A  (v,  x') 

Proof:  The  proof  is  by  induction  on  the  length  of  x .  The  base  case  is  trivial  with  x  =  s 
and  v  =  ii .  For  the  induction  step,  suppose  x  =  /  ::  x”  ;  either  W.  u  a’  <£  C  or 
3n' .  {u  a'  JcC.In  the  former  case,  the  result  is  trivial  with  v  =  a ,  x'  =  x .  In  the 
latter  case,  the  induction  hypothesis  gives  (ii\  x”)  —>  (v,  x')  for  some  v ,  x'  and  the  result 
follows.  ■ 

Lemma  6-3.  Uniqueness  properties.  Each  of  the  relations  is  a  (partial)  function. 

Proof:  It  is  clear  that  exactly  one  rule  from  Figure  6-7  applies  for  each  bytecode  expression 
e.  Therefore: 

Ve,  ii,  u' ,  c,c' .  e  — »  u  (c)  Ae->  a’  (c’)  a  =  a'  ac  =  c' 

Exactly  one  rule  from  Figure  6-8  applies  for  each  n  (c) .  (Note  that  if  [ii  >c  ti'}  cz  C  and 
{ii  >c  ii”}  e  C  then  by  closure  of  C,  {u'  =  a”}  e  C  and  hence  u'  =  a" .)  Therefore: 

Vw,  c,  v,  v\  a  (c)  — >  v  a  ii  (c)  — >  v’  =>  v  =  v' 

Exactly  one  rule  from  Figure  6-9  applies  for  each  (u,  x) .  (Note  that  if  [u  <7  u'  }cC  and 
{ii  u" }  e  C  then  by  closure  of  C,  {a'  =  a”}  e  C  and  hence  u'  =  a" .)  Therefore: 

Vw,  X,  V,  v\  (?/,  x)  (v,  x')  A  (ll,  x)  (v',  x")  =>  V  =  v'  ax'  =  x" 

Putting  these  together  gives: 

Ve,  x,  v,  v'.  (e,  x)  — »  (v,  x')  a  (e,  x)  (v',  x")  =>  v  =  v'  ax'  =  x”  ■ 

Lemma  6-4.  Component  transitivity  property.  Component  evaluation  respects 
concatenation  of  component  lists. 
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V?/,  c,  c',  v.  u(c  ©  c')  — »  v  <=>  37.  u  ( c )  -^7a7(c')->v 

Proof:  The  proof  is  by  induction  on  the  length  of  c .  The  base  case  c  =  s  is  trivial,  with 
7  =  // .  For  the  induction  step,  suppose  c  =  c  ::  c"  .  In  the  forward  direction,  we  have 
?/  (c  ::  c"  ©  c')  — »  v .  This  requires  {u  >c  u'  }cCa  ii'  (c"  ©  c’)  — »  v  By  the  induction 
hypothesis,  3 7.  u'  (c")  — » 7  a  t(c')  — »  v .  But  then  u  { c  ::  c")  — »  7 ,  as  required. 

In  the  reverse  direction,  we  have  37.  u  (c  ::  c")  — »  7  a  t(c')  v  .  This  requires 
{u  >c  a’ }  c  C  a  a’  (c")  t .  By  the  induction  hypothesis,  u'  ( c ”  ©  c ')  v .  Then 

ii  (c  ::  c"  ©  c')  v ,  as  required.  ■ 

Lemma  6-5.  Instance  suffix  property.  In  instance  evaluation,  the  leftover  context  is  a 
suffix  of  the  initial  context.  When  the  difference  between  those  contexts  is  itself  used  as 
the  context  for  evaluation,  the  resulting  leftover  context  is  empty. 

V?/,  x,  u\  x' .  (it,  x)  — »  (V,  x')  (3 v,y.  x  =  y  ©  x'  a  (u,y)  (v,  s)) 

Proof:  The  proof  is  by  induction  on  the  length  of  x  .  For  the  base  case  x  =  s  ,  the  result  is 
trivial,  with  v  =  u  and  y  =  s  .  For  the  induction  step,  suppose  x  =  i  ::x" .  Then  either 
W'.  ii  it”  y  C  or  3 it" .  {  ii  it” }  czC .  In  the  former  case,  the  result  is  trivial  with 
v  =  it ,  x'  =  x  and  y  =  s .  In  the  latter  case,  we  have  (//",  x")  (//',  x')  .  The  induction 

hypothesis  gives  3v,  y.  x"  =  y  ©  x'  a  (u”,y)  (v,  s)  .  Then 

x  =  (i::y)  ffi  xf  a  (//,  7 ::  y)  — >  (v,  s) ,  as  required  (substituting  i ::  y  for  y).  M 

Lemma  6-6.  Component  propagation  property.  Components  propagate  along  instance 
chains. 

V?/,  X,  It',  V,  c.  (it,  x)  — >  (v,  s)  A  {it  >c  It'}  cC=> 

(3v'.  (//',  x)  — >  (v\  s)  a  {v  >c  v'}  e  C) 

This  property  can  be  illustrated  using  the  following  diagram.  In  all  the  illustrations  repre¬ 
senting  constraint  sets,  nodes  represent  variables.  A  dashed  edge  represents  an  instance 
constraint,  or  (as  in  this  case)  a  sequence  of  instance  constraints.  A  solid  edge  represents  a 
component  constraint,  or  a  sequence  of  component  constraints.  The  edges  are  labelled  with 
their  instance  or  component  labels;  the  nodes  are  labelled  with  the  names  of  the  variables. 


x 


Any  closed  set  containing  the  left-hand  component  must  also  contain  the  right-hand 
component. 

Proof:  The  proof  is  by  induction  on  the  length  of  x  .  For  the  base  case  x  =  s  ,  the  result  is 
trivial,  with  v  =  it  and  v'  =  it' .  For  the  induction  step,  suppose  x  =  /  ::  x"  .  Then  for 
some  7,  {it  7}cC  and  (7,  x")  — »  (v,  s)  .  By  closure  of  C,  there  exists  7'  such  that 
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{t  \>c  f }  czC  and  {//'  t' }  czC.  By  the  induction  hypothesis, 

3v\  (f,  x")  — »  (v',  s)  a  {v  >c  v'}  ci  C.  It  follows  immediately  that  (//',  i ::  x")  — >  (v\  s), 
as  required.  ■ 

Lemma  6-7.  Instance  transitivity  property. 

V?/,  X,  n'.  (it,  x)  — »  (//',  s)  =>  (W,  V,  w.  (it,  x  ©  x')  — >  (v,  w )  <=>  (it',  x')  — »  (v,  w )) 
This  property  can  be  illustrated  using  the  following  diagram: 


The  small  w  indicates  that  the  instance  chains  converge  at  v,  in  both  cases  yielding  the 
same  leftover  instances  w  . 

Proof:  The  proof  is  by  induction  on  the  length  of  x  .  For  the  base  case  x  =  s ,  the  result  is 
trivial,  with  u  =  u’ .  For  the  induction  step,  suppose  x  =  / ::  x"  .  Then  for  some  t , 

{a  l  jcC  and  (I,  x")  — >  (a' ,  s)  .  By  the  induction  hypothesis, 

Vx',  v,w.  (t,  x"  ©  x')  — >  (v,  w )  <=>  (it',  x')  — >  (v,  w) .  Suppose  (it',  x')  — >  (v,  w ) ;  then 
(t,  x"  ®  x')  (v,  w )  and  hence  (u,  i ::  x"  ®  x')  (v,  w )  ,  as  required.  On  the  other  hand, 

suppose  (t,  x"  ®  x')  — »  (v,  w )  ;  then  (//',  x')  (v,  w )  as  required.  ■ 

Lemma  6-8.  Instance  convergence  property.  Suppose  that  it,  a' ,  s,  s',  c  are  given  such 
that  {a  >c  x,  a'  >c  s' }  e  C .  Suppose  also  that  (u,  x)  (v,  w) ,  (//',  x')  (v,  w) ,  and 

(s,  x)  (t,  w') ,  for  some  given  v,  w,  t,  w' .  Then  (s',  x')  (t,  w’)  . 

This  can  be  illustrated  as  follows: 


v 


Note  how  the  instance  evaluations  of  11  and  u’  in  contexts  x  and  x'  terminate  at  v  with 
leftover  instances  w  ,  but  evaluation  of  .s  and  s'  in  the  same  contexts  may  “go  past”  v’s 
corresponding  component.  (This  can  happen  because  v'  may  have  some  instances  that  v 
does  not  have.  Conceptually,  v  could  be  the  type  of  something  that  is  local  to  a  function, 
but  which  has  a  component  v’  that  escapes  to  a  wider  context.)  The  important  result  here 
is  that  even  though  the  evaluations  of  s  and  s'  do  not  necessarily  yield  v' ,  they  do  yield 
the  same  result. 
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Proof:  The  proof  is  as  follows:  By  Lemma  6-5  (instance  suffix),  there  exist  y,y'  such  that 
x  =  y  ©  w  a  (n,y)  -A  (v,  s)  and  x’  =  y’  ©  w  a  (u',y')  -A  (v,  s) .  By  Lemma  6-6 
(component  propagation),  3v\  (s,  y)  -A  (v',  s)  a  {v  >c  L  jcC.  Then  by  Lemma  6-7 
(instance  transitivity),  \/r,  z.  (s,  y  ©  w)  -A  ( r ,  z)  <=>  (V ,  w )  -A  (/*,  r) .  This  implies 
(v’,w)^(t,w’). 

By  another  application  of  component  propagation, 

3v".  (s',y')  -A  (v",  s)  a  {v  >c  v"}  ci  C.  Because  C  is  closed  and  canonical,  v"  =  v' 
(being  matching  components  of  v).  Thus  (s',y')  — >  (v',  s)  .  Invoking  instance  transitivity, 
Vr,  z.  ( s’,y ’  ©  w)  — »  ( r ,  z)  <=>  (V ,  w )  — >  ( r ,  z) .  But  (v',  w )  — >  ( t ,  w’)  and  therefore 
(s’,y’  ®  w)  — »  ( t ,  w’) ,  i.e.  (s',  x')  — »  ( t ,  ir')  as  required.  ■ 

Lemma  6-9.  Generalized  instance  convergence  property. 

Suppose  that  u,  u’ ,  s,  s',  c  are  given  such  that  u  (c)asa  ii’  (c)  -a  s' .  Suppose  also  that 
(it,  x)  -A  (v,  w ) ,  (?/',  x')  -A  (v,  w ) ,  and  (s,  x)  -A  (i,  ir')  for  some  given  v,  w,  t,  w’ .  Then 
(s’,x’)^(t,w’). 
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Proof:  The  proof  is  by  induction  on  the  length  of  c .  The  base  case  is  vacuous  with  u  =  s 
and  a’  =  s’ .  For  the  induction  step,  suppose  c  =  c  ::  c’ .  Then 
3  r,  r’ .  {it  >c  r,  it’  >c  r’ }  c:  C  a  r(c’)  asa  /  (c'>  -A  s' .  By  the  existence  property 
(Lemma  6-2),  3v',  w" .  ( r ,  x)  -A  (v',  w")  .  Applying  Lemma  6-8  (instance  convergence), 
(r\  x')  -A  (v',  vP")  .  Then  applying  the  induction  hypothesis,  (s',  x')  -A  ( t ,  w')  .  ■ 

Lemma  6-10.  Instance  propagation  property. 

V?/,  c,  v,  a’ .  u  (c)  — >  v  a  {it  a’  }cC^>  (3v' .  a'  (c)  -a  v'  a  {v^.v'JcC) 


Proof:  The  proof  is  by  induction  on  the  length  of  c  .  It  is  trivially  true  for  c  =  s ,  with 
v  =  ii  and  v'  =  it' .  Suppose  c  =  c  ::  c' .  Then  for  some  a"  we  have  {ii  >c  it"  }  c:  C  and 
u"  (c')  -A  v .  By  closure  of  C,  3 1.  {a"  <7  t,  a'  >c  t)  czC .  The  induction  hypothesis  yields 
3v'.  t(c')  -a  v'  a  {v  v'}  a  C.  Then  it'  (c)  -A  v'  as  required.  ■ 
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6.5.3  Constraints  to  Support  Query  Expressions 

6.5.3.1  Inadequacy  of  Program  Constraints 

The  analysis  requires  variables  to  be  associated  with  arbitrary  bytecode  expressions.  This 
may  not  be  possible  using  only  the  constraints  that  are  derived  from  the  program. 

For  example,  consider  the  following  method  m: 

static  void  m(Foo  f)  {  System. out .println ("Hello  Kitty");  } 

Suppose  some  tool  requires  SEMI  to  decide  whether 

m#0  :  f  .  f  ieldA  <-»  m#0  :  f  .  f  ieldB  is  in  the  VPR.  (The  syntax  “m#0”  denotes 
bytecode  offset  0  in  method  m.)  The  method  m  does  not  mention  f ,  and  therefore  there  are 
no  constraints  naming  the  components  of  f  in  the  context  of  m.  Therefore,  although  one  can 
show  m#0  :  f .  f  ieldA  — »  Lm#Q  (0  ::  f  ieldA)  ,  Lm#Q  (0  ::  f  ieldA)  does  not  evaluate  to 
any  ground  variable.  If  this  situation  were  to  stand,  then  the  analysis  would  incorrectly 
deduce  that  the  two  expressions  are  not  related,  when  in  fact  they  may  be. 

6.5.3.2  Query  Constraints 

To  solve  this  problem,  SEMI  takes  as  input  a  set  O  of  bytecode  expressions  required  for  the 
query,  and  decides  2,0^  only  for  those  ej  in  O.  For  each  expression  e  in  O,  constraints 
are  added  to  the  constraint  set  C,  ensuring  that  for  any  context  x ,  (e,  x)  — »  (//',  x')  holds 
for  some  a’ ,  x' . 

Formally,  for  each  e  in  O,  compute  u  and  cl5  ...,  ck  such  that  e  — »  u(cl  ::  ...  ::  ck  ::  s)  . 
Choose  fresh  variables  vl5  ...,  vk,  and  add  the  constraints 

{//  >C[  vv  Vj  >C2  v2, ...,  vk_  ,  vk}  to  N.  Then  \f(u)  (cl  ck  ::  s>  ~^M{vk)  .  (If 
k  =  0 ,  then  set  vk  =  u  and  the  result  holds.)  Thus  we  have,  for  any  context  x ,  and  for  all 
e  in  O,  e  — »  n  (c)  and  u  (c)  v  for  some  u,  c,  v .  From  above,  Vx.  3 1,  x' .  (v,  x)  ( t ,  x') . 
Therefore,  in  summary: 

\/e  g  O.  Vx.  3 1,  x' .  ( e ,  x)  — »  ( t ,  x') 

6.6  Implementing  the  Ajax  Interface 

The  previous  section  specifies  the  approximation  to  the  value-point  relation  computed  by 
SEMI.  This  section  describes  an  efficient  implementation  of  the  Ajax  interface  using  this 
approximation.  I  describe  how  the  Ajax  interface  is  implemented  in  terms  of  a  given  closed 
constraint  set;  SEMI’s  algorithm  for  computing  a  closed  constraint  set  is  described  in  the 
next  chapter. 

Recall  that  the  Ajax  API  specifies  the  following  parameters  to  the  analysis: 

•  A  type  D  of  intermediate  data  to  be  propagated 

•  A  type  R  of  tool  target  data 

•  An  associative,  commutative,  idempotent  binary  “merge”  operator  DM  :  D  x  D  — »  D 
with  identity  element  DE 
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•  A  set  S  of  source  value-points  from  which  data  will  be  propagated 

•  A  set  T  of  target  value-points  to  which  data  will  be  propagated 

•  An  initial  assignment  of  intermediate  data  to  source  value-points  Dj :  S  — »  D 

•  A  map  from  target  expressions  to  tool  target  data  Tr:T^R 
The  analysis  computes: 

Xt  g  T.  Dm{Ds(s )  |  5  g  S  a  5  <-»  /} 

This  is  computed  efficiently  using  a  graph,  similar  to  the  method  used  by  RTA 
(Section  5.3). 

Note  that  the  set  of  bytecode  expressions  Q  used  above  in  Section  6. 5. 3. 2  can  be  taken 
simply  as  the  union  of  S  and  T. 

Multiple  queries  are  treated  separately.  The  intermediate  data  computations  described 
below  are  local  to  each  query. 

6.6.1  The  Graph 

SEMI  constructs  a  propagation  graph  with  nodes 

PN  =  {In-/  |  t  g  Variables(C)}  vj  {Out-/  1 1  g  Variables(C)} 
and  edges 

PE  =  {(In-?/,  In-v)  |  3/.  {//  v}  <z  C} 
vj  {(Out-v,  Out-//)  |  3/.  {//  v}  ci  C} 
vj  {(In-/,  Out-/)  1 1  g  Variables(C)} 

Lemma  6-11.  Path  invariant.  SEMI  relates  ex  e1  if  and  only  if  there  is  a  path  from 
In-w  to  Out-v  where  ex  — »  //'  (c)  ,  //'  (c)  — »  // ,  e2  — »  v'  (d)  ,  and  v’  (d)  v  for  some  u ,  v , 
//'  ,v',c,d. 

Intuitively,  the  two  base  types  for  the  expressions  have  a  common  instance  type  if  and  only 
if  there  is  a  path  from  one  base  type  to  the  other  in  the  propagation  graph  (which  is  essen¬ 
tially  two  copies  of  the  instance  graph  pasted  together). 

Proof:  Suppose  that  SEMI  relates  ex  <-»  .  Then 

3 1,  xx,  x2,  xx’,  x{.  (ex,  xx)  -»  (t,  x/)  a  (e2,  x2)  — >  (t,  x2') 

From  the  uniqueness  properties  of  the  relations,  we  have  (//,  x, )  — >  (/,  x,')  and 
(v,  x2)  — >  (/,  x2')  .  (The  existence  of  u ,  v ,  //' ,  v’ ,  c ,  d  follows  from  the  added  query 
constraints,  as  discussed  above  in  Section  6.4.5.)  It  follows  that  there  is  a  path  in  the  graph 
from  In-//  to  In-/  and  from  Out-/  to  Out-v.  There  is  an  edge  from  In-/  to  Out-/.  Therefore, 
there  is  a  path  from  In-//  to  Out-v. 

Conversely,  suppose  there  is  a  path  from  In-//  to  Out-v.  There  must  exist  an  edge  in  the  path 
connecting  In-/  to  Out-/'  for  some  /  and  /' .  All  such  edges  are  of  the  form  (In-/,  Out-/)  , 
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therefore  f  =  t.  Furthermore  there  is  a  path  from  In-w  to  In-/;  this  path  passes  only  through 
In  nodes  (because  there  are  no  edges  from  any  Out  node  back  to  an  In  node).  This  implies 
that  for  some  sequence  of  instances  xl ,  (it,  x, )  -a  (/,  s) .  Similarly  there  is  path  from  Out- 
t  to  Out-v  and  for  some  x2 ,  (v,  x2)  — »  (/,  s)  .  Therefore 

(eu  Xj)  -A  ( t ,  s)  a  (e2,  x2)  —>  (t,  s)  and  SEMI  will  conclude  iOt.  ■ 

6.6.2  Computing  Analysis  Results 

The  results  are  computed  efficiently  over  the  graph  using  almost  exactly  the  same 
algorithm  as  for  RTA  (Section  5.3.2).  The  only  difference  is  the  way  in  which  expressions 
are  mapped  to  nodes  in  the  graph. 

The  assignment  A  over  graph  nodes  is  computed  iteratively  as  follows: 

00  =  daAds(s )  I  s  e  S  a  (3u,  c,  //'.  s  — >  u'  ( c )  a  it'  ( c )  — >  ii  a y  =  In-//)} 

An  l ,(y)  =  DM({An(p )  |  (p,  y )  e  PE}  u  {An(y)}) 

The  algorithm  terminates  when  An  +  ]  ()’ )  =  An(y)  .  The  result  of  the  analysis  is  then: 

{(d,  F[{A(jt)  |  3t  g  T.Tr(0  =  d  a  f.jt}])  |  d  e  range  TR} 

6.6.3  Incrementality 

The  algorithm  for  computing  the  closed  constraint  set  is  incremental,  in  the  sense  that 
adding  new  constraints  to  the  initial  set  (e.g.,  in  response  to  changes  in  the  input  program) 
will  cause  new  constraints  to  be  added  to  the  closed  result  set.  This  process  is  discussed 
further  in  Chapter  7. 

This  means  that  new  edges  and  nodes  are  added  to  an  existing  propagation  graph.  The 
results  are  updated  incrementally  in  response  to  changes  in  the  graph  and  in  the  analysis 
parameters,  in  much  the  same  way  as  the  RTA  implementation  operates  (Section  5.3.5). 

Because  incremental  extensions  to  the  initial  constraints  are  supported,  there  is  actually  no 
need  to  know  the  set  O  of  query  expressions  in  advance.  Whenever  a  new  query  expression 
is  encountered,  it  is  added  to  O  and  everything  is  updated  appropriately. 

6.7  Proving  Soundness 

6.7.1  Overview 

6.7.1. 1  Strategy 

Suppose  a  tagged  trace  T=  <E0,  . . E„>  is  given. 

In  Section  6.7.2  below,  we  define  a  function  Creation(v)  mapping  each  tagged  value  v 
occurring  in  the  trace  to  a  pair  (/,  e') .  The  idea  is  that  the  first  occurrence  of  v  is  in  state  S 
and  can  be  obtained  by  evaluating  e'  in  that  state. 

In  Section  6.7.4  we  define  a  function  Context(/),  mapping  each  state  index  i  to  a  context 
associated  with  state  S This  context  can  be  thought  of  as  identifying,  for  each  method  in 
the  call  stack,  which  of  the  polymorphic  instances  of  the  method  is  active.  The  definition 
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of  the  Context  function  requires  an  auxiliary  CallerState  function,  defined  in  Section  6.7.3. 
CallerState(^)  finds  the  state  at  which  the  “current  method”  executing  in  state  E^  was 
invoked. 

Section  6.7.5  proves  the  following  conformance  lemma: 

V/,  e,  v,  u,  x' .  (E-,  e)  v  a  (e,  Context(/))  — >  (it,  x')  => 

Si',  e' .  Creation(v)  =  (/',  e')  a  i'  <  i  a  (e',  Context(/'))  — >  (u,  x') 

The  idea  is  that  given  an  expression  evaluating  to  a  value  in  a  particular  state,  we  can  look 
back  to  where  the  value  was  created  and  determine  the  expression’s  ground  type  in  terms 
of  that  creation  state. 

Soundness  is  a  corollary  of  this  lemma.  By  definition,  two  expressions  related  by  the  VPR 
must  give  the  same  value  when  evaluated  in  some  pair  of  states.  Applying  the  conformance 
lemma  twice,  once  for  each  expression  in  its  associated  state,  we  show  that  the  ground  types 
of  the  expressions  are  both  equal  to  the  ground  type  of  the  value,  and  therefore  equal  to  each 
other.  Thus  we  can  be  sure  that  SEMI  relates  the  two  expressions. 

Formally,  suppose  el<-^  e2  where  eve2  e  O .  Then  by  definition  there  is  a  tagged  trace  T 
and  states  S ,■  and  Ej  in  T  such  that  (E-,  e , )  v  and  (S  -,  e2)  v  for  some  tagged  v. 

Choose  r/1 ,  x/  such  that  (ev  Context(/))  —>  (u{,  Xj')  and  u2 ,  xf  such  that 
( e2 ,  Context(/))  — »  (u2,  x2)  (they  must  exist  according  to  Section  6. 5. 3. 2).  Then  by  the 
conformance  lemma. 

Si',  e' .  Creation(v)  =  (/',  e’)  a  i'  <  i  a  ( e Context(/'))  — >  (u{,  xf) 

Si",  e" .  Creation(v)  =  (i" ,  e")  a  i"  <  /  a  (e" ,  Context(7'"))  — >  (u2,  xf) 

In  Section  6.7.2. 1  below,  I  show  that  Creation  is  a  function  —  i.e.,  i'  =  i"  and  e'  =  e"  . 
Therefore  ul  =  u2  and  x,'  =  x2'  (Lemma  6-3).  Thus  the  analysis  concludes  e }  e2  ■ 

6.7.1.2  Note:  Unique  Justification  for  Transitions 

Many  of  the  proofs  perform  a  case  analysis  of  a  transition  E-  a>  E  ■  |  ,  .  This  depends  on  the 
fact  that,  given  two  states  related  in  this  way,  there  is  always  exactly  one  inference  rule 
justifying  the  transition. 

To  see  that  this  is  so  consider  the  mode  fields  of  the  states  E-  and  E-  +  j .  There  are  four 
possibilities: 


Mode(Ey) 

Mode(S/+  ,) 

Throwing 

Throwing 

Running 

Running 

Throwing 

Running 

Throwing 

Running 

“Exception  return”  is  the  only  applicable  rule. 

“Exception  catch”  is  the  only  applicable  rule. 

“Spontaneous  exception  throw”  is  the  only  applicable  rule. 

The  applicable  rule  is  uniquely  determined  by  the  value  of 
Instruction(PC(Ey)) . 
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6.7.2  The  Creation  Function 

The  creation  function  is  defined  by  the  rules  given  in  Figure  6-11. 1  demonstrate  two 
important  properties:  that  it  is  a  function,  and  that  it  is  defined  for  all  tagged  values  that 
appear  in  the  trace. 

6.7.2.1  “Creation”  Is  a  Function 

Lemma  6-12.  For  some  arbitrary  v,  suppose  that  Creation(v)  =  (/,  e)  and 
Creation(v)  =  (/',  e')  .  We  show  that  i  =  /'  and  e  =  e’ . 

Proof:  From  the  definition  of  the  Creation  function,  (E-,  e )  v  and  (Ey,,  e')  v . 

If  /  =  i’  =  0 ,  then  e  must  be  of  the  form  (Main,  0)  :  staticField  and  e’  of  the  form 
(Main,  0) :  staticField’ .  Then  Tag(v)  =  \\\\td\T'dg(staticI'ieId)  =  InitalTag {staticField’), 
hence  staticField  =  staticField'  by  the  fact  that  InitialTag  is  defined  to  be  a  bijection. 

If  /  =  0  but  i'  >  0 ,  then  Tag(v)  g  Used(S0) ,  and  then  Tag(v)  g  Used(Sy,  ( ) ,  since 
V/,  i' .  i  <  i'  =>  Used(Sy)  c:  Used(Sy,) .  this  fact  is  easily  observed  from  the  transition  rules. 
But  given  that  Creation(v)  =  (/',  e')  ,  for  each  rule  that  can  justify  —7'  —  i  — ■  —7'  ,  there  is  a 
constraint  that  Tag(v)  A  Used(Sy,  _  .  Therefore  this  situation  is  impossible.  Similar 

reasoning  excludes  i'  =  0  with  /  >  0 . 

Consider  i  >  0  and  i'  >  0  .  Then  Tag(v)  g  Used(Sy_  j)  a  Tag(v)  £  Used(Sy,  _  j) ,  but 
Tag(v)  g  Used(Sy)  a  Tag(v)  g  Used(Sy,) ,  therefore 

Tag(v)  g  Used(  %)A  Tag(v)  g  Used(Sy,)  for  all  j  >  i  and  j  >  i' .  Therefore  i'  -  1  <  i  and 
/  -  1  <  i’ ,  i.e.,  /  =  /' . 

Now  consider  the  transition  —7  -  1  — ■  —7  ‘  If  it  is  justified  by  one  of  the  rules  for 

aconst_null,  bipush,  iadd,  or  instanceof,  then  e  =  e'  =  PC(H()  :  stack- 0  . 

If  the  transition  is  justified  by  the  rule  for  new,  and  e  ^  e’ ,  then  one  of  e  or  e'  must  be  of 
the  form  PC(Sy)  :  stack-0  .field .  Without  loss  of  generality,  suppose 
e  =  PC(Sy) :  stack-0  .field.  Then  there  are  two  cases,  e'  =  PC(Sy)  :  stack- 0  or 
e'  =  PC(Hy)  :  stack- 0  .field'  where  field t field' .  Consulting  the  transition  rule,  the 
former  case  is  impossible  because  Tag(v)  =  t  =  tagsifield)  violates  the  condition 
t  £  range  tags .  The  latter  case  is  impossible  because  Tag(v)  =  tags  field)  =  tags  field') 
violates  the  condition  that  tags  is  a  bijection. 

The  same  reasoning  applies  to  the  case  in  which  the  transition  is  justified  by  the  rule  for 
spontaneous  exception  throws,  except  that  e  =  PC(Sy) :  exn  .field  and  e'  =  PC(Sy)  :  exn 
or  e'  =  PC(Sy) :  exn  .field' .  ■ 

6.7.3  The  CallerState  Function 

6.7.3.1  Definition 

The  CallerState  function  determines  at  which  state  in  a  trace  a  method  invocation  began: 
CallerState(A')  =  max  {i  \  i  <k  a  Bframe .  MStack(E/.)  =  frame  ::  MStack(Sy)} 
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E-_  l  ^  E-  justified  by  rule  for  aconst_null 
(Ey,  PC(Sy) :  stack-0)  v 

Creation(v)  =  (/,  PC(Ey) :  stack-0) 

Ey  _  j  Sy  justified  by  rule  for  bipush  byte 
(Ey,  PC(Ey) :  stack-0)  ""*■  v 

Creation(v)  =  (/,  PC(E;)  :  stack- 0) 

Ey  _  j  ^  Ey  justified  by  rule  for  iadd 
(Ey,  PC(Ey) :  stack-0)  v 

Creation(v)  =  (/,  PC(Ey)  :  stack- 0) 

Ey  _  j  ^  Ey  justified  by  rule  for  new  classID 
(Ey,  PC(Ey)  :  stack-0)  v 

Creation(v)  =  (/,  PC(Ey)  :  stack-0) 

Ey  _  j  Sy  justified  by  rule  for  new  classID 
(Ey,  PC(Ey)  :  stack- 0  .  field)  v 

Creation(v)  =  (/,  PC(Ey)  :  stack-0  .field) 

Ey  _  j  ^  Ey  justified  by  rule  for  instanceof  classID 
(Ey,  PC(Ey)  :  stack-0)  ”"*■  v 

Creation(v)  =  (/,  PC(Ey)  :  stack-0) 

j  ^  Ey  justified  by  rule  for  spontaneous  exception  throw 
(Ey,  PC(Ey)  :  exn)  »'*  v 

Creation(v)  =  (/,  PC(Ey) :  exn) 

Figure  6-11.  Rules  defining  the  Creation  function 
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Sy  _  j  ^  Sy  justified  by  rule  for  spontaneous  exception  throw 
(Sy,  PC(Sy) :  exn  .field)  v 

Creation(v)  =  (/,  PC(Sy) :  exn  .field) 

(SQ,  (Main,  0)  :  staticField )  v 

Creation(v)  =  (0,  (Main,  0)  :  staticField ) 

Figure  6-11.  Rules  defining  the  Creation  function 

It  computes  the  state  number  /  which  called  into  the  method  active  at  state  k,  by  finding  the 
most  recent  state  at  which  the  call  stack  was  one  element  shorter  than  the  current  call  stack. 

This  function  is  used  below  to  define  the  Context  function.  Here  we  prove  some  “obvious” 
but  useful  properties  of  the  CallerState  function  that  are  required  below.  These  properties 
are  really  invariants  of  the  MJBC  semantics  ensuring  that  the  call  stack  and  the  program 
counter  behave  in  a  disciplined  way. 

6.7.3.2  Scope  of  Definition 

CallerState  is  defined  whenever  the  run  time  stack  is  nonempty  (i.e.,  the  current  method 
was  called  by  some  other  method). 

Lemma  6-13.  The  function  CallerState  is  defined  for  all  k  such  that  MStack(S  k)  *  8  • 
Proof:  To  prove  this,  it  suffices  to  prove  that  the  set 

{/ 1  /  <  k  a  f  frame .  MStack(Ey)  =  frame  ::  MStack(Sy)} 
is  nonempty  if  MStack(S  k)  *  8  .  This  is  shown  by  induction  on  k. 

For  k  =  0 ,  MStack(Ey)  =  s  . 

For  k  >  0 ,  consider  the  transition  Ek  _  j  ^  Ek .  If  the  transition  was  not  justified  by  a  rule 

for  method  invocation,  method  return,  or  exception  return,  then 

MStack(S£_  j)  =  MStacktSy.)  and  the  result  follows  from  the  induction  hypothesis. 

If  the  transition  was  a  method  return  or  exception  return,  then 

MStack(S^_  j)  =  /::  MStacktE^)  for  some /,  and  therefore  MStack(S^_  , )  ^  s  .  Applying 
the  induction  hypothesis,  CallerState(£  -  1)  is  defined.  Therefore  there  exists  an  j  such  that 

/'  <  k  -  1  a  f  frame .  MStack(S^_  f  =  frame  ::  MStack(H  -) 

Hence  MStack(S  ■)  =  MStacktSy.) .  Then,  using  the  induction  hypothesis  again,  if 
MwStack(Sy.)  =  MStackt^j  ^  s ,  then 

{/ 1  i  <j  a  f  frame .  MStackt^)  =  frame  ::  MStack(Sy)}  ^  0 
{/ 1  i  <  k  a  f  frame .  MStack(Sy.)  =  frame  ::  MStack(Sy)}  ^  0 

If  the  transition  was  a  method  invocation,  then  for  some /, 

MStacktSy.)  =  /::  MStacktSy.  _  .  Then  the  set 
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{/ 1  /  <  k  a  E \frame.  MStack(S^)  =  frame  ::  MStack(Sy)}  contains  k  -  1  and  is 
nonempty.  ■ 

6.133  Nested  Call  Stack 

The  call  stack  for  the  current  state  is  a  suffix  of  the  call  stack  in  every  state  during  the 
lifetime  of  the  current  method  invocation.  In  other  words,  the  call  stack  may  grow 
downward  due  to  this  method  calling  into  another  method,  but  the  current  activation  record 
and  the  records  above  it  on  the  stack  are  not  popped  or  modified.  We  only  need  to  prove 
this  for  states  between  the  current  state  and  the  invocation  of  the  current  method. 

Lemma  6-14.  If  c  =  CallerState(^)  then 

V/.  c  <i  <k=>  (MStack(S;.)  is  a  suffix  of  MStack(Sy))  . 

Proof:  The  proof  is  by  induction  on  k-i . 

For  k-i  =  0 ,  the  result  is  trivial. 

Now  consider  k-i  =  p  where  the  induction  hypothesis  holds  for  k-i  =  p  -  1 .  That  is, 
assume  c  <i  <k  and  MStack(Et)  is  a  suffix  of  MStack(Sy  +  / .  Consider  the  transition 
S  ■  'H>  S  ■  i 

— I  —  — l  +  1 

If  the  transition  is  not  justified  by  a  rule  for  method  invocation,  method  return,  or  exception 
return,  then  MStack(Sy)  =  MStack(S/+  /  and  it  follows  immediately  that 
MStack(E/)  is  a  suffix  of  MStack(Sy) . 

If  the  transition  is  a  method  return  or  exception  return,  then 

MStack(Sy)  =  /::  MStack(Sy  +  for  some  f  and  again  the  result  follows  immediately. 

If  the  transition  is  a  method  invocation,  then  for  some/,  MStack(Sy  +  /  =  /: :  MStack(Sy) . 
By  the  induction  hypothesis,  either  MStack(Sy  +  |)  =  MStack(E/)  or  MStack(E/)  is  a 
proper  suffix  of  MStack(Sy  +  /  .  In  the  latter  case,  MStack(E/)  is  a  suffix  of  MStack(Sy) . 
In  the  former  case,  one  obtains  MStack(S^)  =  /::  MStack(Sy) .  But  then  i  is  an  element  of 
the  set  {/'  |  i’  <  k  a  E \frame.  MStack(S^)  =  frame  ::  MStack(Sy,)}  and  i  >  c ,  contradicting 
the  definition  of  c.  ■ 

6.7.3.4  Preservation  of  Caller  State 

The  activation  record  on  top  of  the  call  stack  reflects  the  state  just  before  we  began  the 
current  method  invocation. 

Lemma  6-15.  If  c  =  CallcrStatc/)  and  MStack(E/)  =  (pc,  S,  3)  ::  f  then 
Sc  =  [pc:  pc,  wstack:  S,  locals:  3,  mstack:  p]  for  some  value  of  p  . 

Proof:  By  the  nested  call  stack  lemma,  MStack(E/)  is  a  suffix  of  MStack(Sc  +  /  .  By  the 
definition  of  c,  f  frame .  MStackdE/  =  frame  ::  MStack(Sc) .  Therefore  MStack(Sc)  is  a 
proper  suffix  of  MStack(Sc  +  ,) ,  implying  that  the  transition  must  be  a  method 

call.  The  method  call  rules  guarantee  that  MStack(Sc  +  ,)  =  (pc,  S,3)  f  where 
sc  =  [pc:  pc,  wstack:  S,  locals:  3,  mstack:  /  p] .  Since  MStack(E/)  =  frame  ::  f  and 
MStack(E/)  is  a  suffix  of  (pc,  S,  3)  ::  ff,  it  follows  that  MStack(S^)  =  (pc,  S,3)  ::  f.M 
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6.7. 3.5  Method  Entry  Correspondence 

On  beginning  the  current  method  invocation,  the  program  counter  was  set  to  bytecode 
offset  zero  of  the  current  method.  The  important  thing  to  prove  is  that  the  method 
invocation  actually  invoked  the  same  method  as  the  current  method. 

Lemma  6-16.  If  c  =  CallerState(^)  then  PC(Sc  + ,)  =  ( Code Loc M c t h od( PC ( Ef) ,  0)  . 

Proof:  The  proof  is  by  induction  on  k  -  c .  Since  k>  c ,  the  base  case  is  £  =  c  +  1 .  Let 
(m,  offset)  =  PC(Sc  +  j) .  Then  Code  Loc  Method!  PC!  E^))  =  m  .  Furthermore  the 
transition  Ec  ^  Ec  +  x  is  a  method  call,  and  therefore  offset  =  0 ,  as  required. 

Now  suppose  c  =  CallerState(£)  and  consider  the  transition  Ek_^Ek.  Whenever 
MStack(E£_  j)  =  MStackfE,.)  then  the  transition  rule  also  requires 
CodcLocMcthod! POE,.  __  ))  =  C ode Loc Me th od( PC ( Ek)) ,  and  then  the  result  follows 
from  the  induction  hypothesis. 

If  the  transition  was  a  method  invocation,  then  for  some  / 

M  S  tack!  E,.)  =  /::  MS  tack!  EA.  _  .  But  that  implies  c  =  CallerState(^)  =  k-  1 ,  which 

only  occurs  in  the  base  case. 

If  the  transition  was  a  method  return  or  exception  return,  then 

MStack(S£_  j)  =  (PCG2*)  MStack(S^)  for  some  S,  A ,  where  x  =  0  for 

exceptional  returns  and  x  =  1  for  normal  returns.  Let  d  =  CallerStatelA' -  1) .  By  preser¬ 
vation  of  caller  state  (Lemma  6-15),  PC(Erf)  =  PC(S^)  -  x  and 
MStack(Srf)  =  MStackfE,.) .  This  also  gives 

Code  Loc  Method!  PC(Erf))  =  CodcLoc  Method!  POE,.)) .  Furthermore,  by  the  nested  call 
stack  lemma  (Lemma  6-14), 

V/.  d  </  <k-  1  =>  MStack(S^)  is  a  suffix  of  MStack(Ey) 

Therefore 

CallcrState(i)  =  max  {/ 1  /  <  k  a  Bframe .  MStack(S^)  =  frame  ::  MStack(Ey)} 

=  max  {/ 1  /  <  d  a  Bframe .  MStack(S^)  =  frame  ::  MStack(Ey)} 


But  M Stack! E^,)  =  MStack(S^)  and  therefore 

CallcrStatc(i)  =  max  {/ 1  /  <  d  a  Bframe .  MStack(S^)  =  frame  ::  MStack(Ey) } 

That  is,  CallerState(^)  =  CallerState(J)  =  c  .  Now  we  appeal  to  the  induction  hypothesis 
applied  tod.  ■ 

6.7.4  The  Context  Function 

The  Context  function  maps  a  state  index  to  a  list  of  instance  labels,  identifying  exactly 
which  polymorphic  instance  of  each  currently  active  method  was  invoked. 
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6.7.4.1  Definition  of  the  Context  Function 

The  Context  function  is  defined  inductively  as  follows: 

Context(O)  =  s 

For  /  >  0 ,  Context!/)  depends  on  the  form  of  the  transition  E-  _  1  ^  E- . 

Case:  The  transition  is  justified  by  the  rule  for  invokes  tat  ic. 

Context!/)  =  PC(Sy_  j)  ::  Context(/ -  1) 

Case:  The  transition  is  justified  by  the  rule  for  invoke  virtual. 

Then  Ef_  1  is  of  the  form  [pc:  pc,  wstack:  Vj  ::  v0  ::  S,  locals:  A,  mstack:  ([,  heap:  p] ,  and 

Instruction(pc)  =  invokevirtual  methodID .  Let  (/',  e)  =  Creation(vQ)  and 
classID  =  HeapObjClass(^(Val(v0))) .  Now  consider  the  transition  Ejt  _  ,  Ef  .  If  it  is 
justified  by  the  rule  for  new,  set 

Context!/)  =  classID -methodID  ::  PC(Ey,  _  j)  ::  Context!/') 

Otherwise  it  is  justified  by  the  rule  for  spontaneous  exception  throws,  since  that  is  the  only 
other  creating  rule  which  adds  a  mapping  for  Val(v0)  to  'A_.  Set 

Context(/)  =  classID -methodID  ::  err  -classID  ::  err-PC(Sy,  _  j)  ::  Context!/') 

Case:  The  transition  is  justified  by  the  rule  for  return. 

Context!/)  =  (PC(Sy)  -  l)-PC(Sy) ::  Context(CallerState(/  -  1)) 

CallerState(/  -  1)  is  well-defined  because  MStack(Sy_  j)  must  be  nonempty  for  the 
return  to  execute  successfully. 

Case:  The  transition  is  justified  by  the  rule  for  exceptional  returns. 

Context!/)  =  Context(CallerState(/  -  1)) 

The  reason  for  the  asymmetry  between  normal  and  exceptional  returns  is  that  a  normal 
return  transfers  control  to  the  instruction  following  the  method  invocation,  but  an  excep¬ 
tional  return  does  not. 

Case:  The  transition  is  justified  by  a  rule  for  exception  throws  (either  an  execution  of 
athrow  or  a  spontaneous  exception  throw).. 

Context!/)  =  Context!/  -  1) 

Exception  throw  transitions  simply  change  the  state  from  RUNNING  to  THROWING  and  do 
not  themselves  transfer  control. 

Case:  All  other  transitions  induce  the  following  rule: 

Context!/)  =  PC(S;  _  ^-PQE-) ::  Context!/ -  1) 
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6.7.4.2  Preservation  of  Return  Types 

This  lemma  proves  that  the  return  type  R/w.  and  the  type  Xpc  of  any  thrown  exception  at 
some  instruction  pc  map  correctly  to  the  actual  return  type  and  exception  type  of  the 
method. 

Lemma  6-17.  The  return  type  and  thrown  exception  type  inferred  for  a  method 
correspond  to  the  return  type  and  exception  type  actually  used  in  all  contexts. 

V/,  m,  c.  m  =  CodeLocMethod(PC(Sy))  a  c  =  CallerState(/)  => 

3w.  Context(/)  =  w  ©  Context(c  +  1) 

a  (AT(RpC|g.)),  w)  — >  (M(R(III'  0j),  s)  a  (A7(XpQg^),  w)  — >  (M(X(ni  0|),  s) 

Proof:  The  proof  is  by  induction  on  i  -c . 

The  fact  c  =  CallerState(/)  implies  c  <  i .  Therefore  the  base  case  is  /  =  c  +  1 .  Set 
w  =  s ,  and  the  result  is  trivial,  noting  PC(Ec  + ,)  =  (m,  0)  by  the  method  entry  corre¬ 
spondence  lemma  (Lemma  6-16). 

Now  consider  the  transition 

Case:  The  transition  is  an  exception  throw.  Then  PC(S;  ,)  =  PC(S-)  and 
Context(/)  =  Context(7  -  1) .  Also  MStack(Sy_  j)  =  MStack(Sy)  implying 
c  =  CallerState(7')  =  CallerState(7  -  1) .  We  apply  the  induction  hypothesis  to  get 

3 if.  Context(/  -  1)  =  w  ©  Conte xt(c  +1) 

a  (M(RPC^_i)),  w)  — >  0|),  s)  a  (A7(XpC(g  ^ ^),  w)  — >  (M(X(i}^  0|),  s) 

This  is  equivalent  to  the  desired  result. 

Case :  The  transition  is  the  normal  execution  of  an  instruction  other  than  invokestatic, 
invokevirtual  or  return.  Then  let  pc  =  PC(Sy  l)  and  pc'  =  PC(Sy) ;  then 
CodeLocMethod(pc)  =  CodeLocMethod(pc') ,  and 

Context(/)  =  pc-pc' ::  Context(/-  1).  Also  MStack(pc)  =  MStack(pc')  implying 
c  =  CallerState(/)  =  CallerState(7  -  1) .  We  apply  the  induction  hypothesis  to  get 

3if '.  Context (/  -  1)  =  w'  ©  Context (c  +1) 

A  (M(Rf),  *')  —>  m R(;„,  0))’  B)  A  (M(Xpcl  w')  (M(X(W>  0)),  8) 

The  executed  instruction  induces  the  constraints  Succ(/?c,  pc’ ,  s,  I)  for  some  .s  and  /. 
Therefore  {M(Rpc,)  4pc_pc,  Rpc),  M(Xpc,)  4pc_pc,  M{Xpc) }  c  C .  Set  w  =  pc-pc’  ::w’. 
Then  Context(/)  =  w  ©  Conte xt(c  +1)  and 

(M(Rk),  w)  (M(R(JIJ)  0)),  s)  a  (M(Xpc,),  w)  (M(X(JIJ)  0)),  s) 

as  required. 

Case:  The  transition  was  a  method  invocation.  Then  for  some / 

MStack(Sy)  =  /::  MStack(Sy  j) .  But  that  implies  c  =  CallerState(/)  =  7-1,  which 
only  occurs  in  the  base  case,  so  this  case  cannot  occur. 
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Case:  The  transition  was  a  method  return  or  exceptional  return.  Then 
MStack(E,  _  | )  =  (PC(Ey)  -x,  S,  4)  MStack(S-)  for  some  S,  A,  where  x  =  0  for 
exceptional  returns  and  x  =  1  for  normal  returns.  Let  d  =  CallerState(/  -  1) .  By  preser¬ 
vation  of  caller  state,  PC(E^)  =  PC(Ey)  -  x  and  MStack(E^)  =  MStack(Ey) .  This  also 
gives  m  =  Code Loc Met hod(  PC ( Ey))  =  CodeLocMethod(PC(Srf)) .  Furthermore,  by  the 
nested  call  stack  lemma,  V/'.  d <  /'  <  k  -  1  =>  MStack(Ey)  is  a  suffix  of  MStack(Ey-) . 
Therefore  c  =  CallerState(/)  =  CallcrStatcL/) .  Now  we  appeal  to  the  induction 
hypothesis  applied  to  d ,  yielding 

3 if'.  Context (d)  =  if'  ©  Context (c  +1) 

a  C^(Rpc(HA))’  w  )  (AT(R 0))’  s)  A  (MXpctg^)),  w  )  — >  (M(X(iii'  0j),  s) 

If  the  transition  was  an  exceptional  return,  then  PC (Erf)  =  PC(Ey)  and 
Context(/)  =  Context^/) ;  the  required  result  is  obtained  by  setting  if  =  if' . 

Otherwise  the  transition  was  a  normal  return.  Then  PC(E^)  =  PC(Ey)  -  1  and 
Context(/)  =  (PC(Ey)  -  1  )-PC(Ey)  ::  Context(CallerState(7  -  1)) .  The  method  invocation 
instruction  at  d  induces  the  constraints  Succ(PC(E^) ,  PC(Ey) ,  -V,  /)  for  some  .s  and  /. 
Therefore 

{MRpccs,))  ^ ( PC(H,)  -  1 ) -PC(H,.)  T/(RpC(=)  _  i), 

^^PC(E:?  ^(PC(H,.)-  1  )-PC(H,)  ^XpC(=)_  |)  }  c:  C 

Set  w  =  (PC(Ey)  -  l)-PC(Ey) ::  w' .  Then 
Context  (/)  =  w  ©  Conte xt(c  +1) 

(MRpc(H,))’  11  )  ~ ^  (M(R<i„,  0))’  S)  A  (^(Xpc(H,))’  11 )  ~ ^  (T/(X((^  ()|),  s)  I 

6.7.5  Proving  the  Conformance  Lemma 

Lemma  6-18.  To  reprise  Section  6.7. 1.1,  the  conformance  lemma  states: 

V/,  p,  v,  77,  x'.  (Ey,  p)  v  a  (p,  Context(/))  — »  (7/,  x)  => 

3/',  p'.  Creation(v)  =  (/',  p')  a  /'  <  i  a  (p',  Context(/'))  — >  (it,  x) 

The  proof  is  by  induction  on  7.  The  induction  hypothesis  is  strengthened  to  note  that,  in 
every  state,  the  ground  type  for  the  global  variable  record  is  the  type  given  to  it  at  the 
beginning  of  Main: 

V/.  ((M(GPC(Ei)),  Context!/))  >  (M(G(Mailli  0)),  s) 
a  Vp,  v,  77,  x'.  (Ey,  p)  v  a  (p,  Context(/))  — »  (77,  x)  => 

3/',  p'.  Creation(v)  =  (/',  p')  a  /'  <  /  a  (p',  Context(/'))  — >  (7/,  x)) 

The  base  case  is  proved  in  Section  6.7.5. 1.  It  is  trivial. 

For  the  induction  step,  I  assume  the  hypothesis  is  true  for  /  <  k  and  prove  it  true  for 
7  =  k  +  1 . 
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The  basic  strategy  to  prove  the  induction  result  is  to  show  that  most  transitions  i 

“preserve  types”  by  extending  the  context  with  an  instance  label  (corresponding  to  method 
call  or  intra-method  control  flow)  and  by  making  the  types  of  local  variables  (and  stack 
locations)  at  the  old  code  location  appropriate  instances  of  the  types  of  local  variables  (and 
stack  locations)  at  the  new  code  location.  This  ensures  that  the  ground  type  obtained  for  e 
evaluated  in  Context^  +  1)  is  the  same  as  when  it  is  evaluated  in  Context^) ,  and  we  can 
appeal  to  the  induction  hypothesis  to  show  that  it  is  the  correct  (//,  x)  . 

This  is  not  possible  for  all  transitions,  because  most  transitions  change  the  program  state, 
and  therefore  for  some  expressions  e  the  value  obtained  by  evaluating  e  in  the  new  state 
differs  from  the  result  of  evaluating  e  in  the  old  state.  Typically  these  cases  are  proved  by 
showing  that  the  initial  constraints  require  the  type  of  e  to  be  related  to  the  type  of  some 
other  expression  e' ,  where  e'  in  the  old  state  evaluates  to  the  same  value  as  e  in  the  new 
state.  This  allows  us  to  again  appeal  to  the  induction  hypothesis. 

Some  other  cases  require  different  techniques.  For  example,  transitions  that  create  new 
values  prove  the  result  by  appealing  directly  to  the  definition  of  Creation,  without  resorting 
to  the  induction  hypothesis.  As  another  example,  the  return  instruction  truncates  the 
Context  for  the  current  state  back  to  the  Context  of  the  caller;  this  case  requires  the  “preser¬ 
vation  of  return  types”  Lemma  6-17  from  above,  as  well  as  other  machinery. 

In  Section  6. 7. 5. 3  we  prove  the  first  part  of  the  induction  result  itself: 

(M(GPC(Si  p,  Context(/))  — »  (M(G(Main  0)),  s) .  The  proof  is  relatively  simple  because  it 
does  not  depend  on  e  and  only  requires  a  case  analysis  of  the  transition  Ek^Zk+l- 
Furthermore,  only  a  few  transitions  modify  global  variables. 

Section  6. 7. 5. 4  proves  the  rest  of  the  induction  result  for  expressions  e  of  the  form 
PC(E*  +  ! ) :  exp  ./,  assuming  it  holds  for  PC(S^+  ,) :  exp .  This  step  also  requires  case 
analysis  of  .  Again,  most  of  the  cases  are  easy  because  most  transitions  do  not 

modify  object  fields. 

Section  6. 7. 5. 5  proves  the  result  for  expressions  of  the  form  PC(S£+  ,) :  staticField . 
Again,  only  a  few  transitions  modify  static  fields. 

The  simple  expressions  referring  to  stack  and  local  variables  require  the  most  work,  and  are 
handled  in  Section  6.7 .5.6  and  following  sections.  For  these  expressions,  we  perform  a  case 
analysis  of  the  form  of  the  transition  and  then  break  down  the  expression  type  within  each 
transition,  according  to  the  manner  in  which  stack  and  local  variables  are  modified  by  the 
transition.  (Almost  every  transition  modifies  the  working  stack  or  local  variables  in  some 
way.) 

The  proof  is  simplified  by  codifying  the  strategy  described  above  (which  relates  the 
expression  e  to  some  expression  e' ,  where  e'  in  the  old  state  evaluates  to  the  same  value 
as  e  in  the  new  state)  using  a  “reduction  function”  (Section  6.1 .5.1)  mapping  e  to  e' .  The 
proof  also  uses  a  “succession  lemma”  (Section  6.7. 5. 8),  which  captures  the  invariants 
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induced  by  the  use  of  the  Succ  function  in  the  initial  constraints.  Nevertheless  for  each 
transition,  some  case  analysis  of  the  form  of  e  is  required. 

One  key  supporting  lemma  is  proved  in  the  context  of  the  induction  hypothesis:  Lemma  6- 
19  in  Section  6. 7. 5. 2.  This  lemma  shows  that  at  the  invocation  of  a  virtual  method,  the  type 
of  the  method  body  actually  invoked  matches  the  type  assigned  to  the  method  at  the 
invocation  site,  in  the  sense  that  they  have  the  same  set  of  ground  types.  (It  is  not  neces¬ 
sarily  the  case  that  one  is  an  instance  of  the  other.)  This  is  used  to  show  that  virtual  method 
calls  and  returns  preserve  types.  This  lemma  follows  by  showing  that  the  type  assigned  to 
the  object  at  the  invocation  site  matches  the  object’s  type  at  its  creation,  which  is  a  conse¬ 
quence  of  the  induction  hypothesis. 

6.7.5.1  Base  Case 

The  base  case  is  /  =  0  .  Suppose  (S0,  e)  v .  By  the  definition  of  a  trace, 

Sq  =  [mode:  Running,  pc:  (Main,  0),  wstack:  s,  locals:  [],  mstack:  s,  heap:  [], 
globals:  InitStaticFields,  used:  range  InitialTags] 

In  this  state,  expressions  of  the  form  pc :  s  tack-??  and  pc :  local-/-?  do  not  evaluate  to 
anything.  Also,  since  the  heap  is  empty,  expressions  of  the  form  pc :  exp  .field  do  not 
evaluate  to  anything.  Therefore  e  must  be  of  the  form  (Main,  0) \staticField.  Therefore 
Creation(v)  =  (0,  e),i.e.  ?'  =  0  =  i  and  e’  =  e ;  noting  that  PC(S0)  =  (Main,  0)  and 
Context(O)  =  s  gives  the  induction  result. 

6.7.5.2  Preservation  of  Virtual  Call  Types 

Lemma  6-19.  The  types  inferred  for  a  virtual  method  implementation  match  up  with  the 
types  inferred  at  each  call  site. 

V/,  methodID,  methodlmpl,  c,  v,  v’ ,  it,  x. 

E^Ei+lAi<kA  Instruction(PC(S;))  =  invokevirtual  methodID 
a  PC(Sy  +  j)  =  {methodlmpl,  0)  a  Mode(Sy)  =  Mode(Sy  +  j)  =  RUNNING 
AM(Tpcm  v0)  < methodID  ::  c  ::  e>  ->  v  a  {A|m  methodImpl)  >c  v'}cC 

((v,  Context(/))  — >  (it,  x)  <=>  (v',  Context(/  +  1))  — >  (it,  x)) 

Proof:  Then  E-  is  of  the  form  [pc:  pc,  wstack:  Vj  ::  v0  ::  S,  locals:  A,  mstack:  heap:  p] . 
Let  (i',e')  =  Creation(v0)  and  classfl)  =  HeapObjClass(^(Val(v0))) .  We  then  let 
pc'  =  PC(S/+1)  =  (methodlmpl,  0) ,  where 
methodlmpl  =  Dispatch(c/rm7D,  methodID ) . 

Let  pc"  =  PC(Sr  _  j) .  Consider  the  transition  .  The  transition  adds  a  mapping 

for  v0  in  the  heap,  therefore  the  transition  is  either  an  execution  of  new  or  a  spontaneous 
exception  throw.  In  Lemma  6-20  below,  we  show  that  in  either  case,  for  some 
w,  s,  s',  s" ,  c,  e'  — »  s'  (c)  ,  s'  (c)  — »  s" ,  s"  (methodID  ::  c  ::  s)  —>  s  and  (v',  w)  (s,  s) 
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where  Context!/  +1)  =  w  ©  Context!/') .  This  means  that  the  created  object,  in  the  context 
in  which  it  is  created,  has  a  type  s  for  the  given  component  c  of  the  object’s  method 
methodID ,  and  s  is  an  instance  of  the  type  v'  we  observe  for  the  method’s  component  in 
state  S/  +  1. 

The  constraints  for  the  invoke  virtual  instruction  include 


Tpc,vO  ^methodID  Tpc,m’  ^pc,  m  ^globals  ^*pc  } 

We  have  (S k,pc  :  stack-1)  v0,  pc  :  stack-1  -v-  (S  ,  tail ::  head  ::  s),  and 
(M(Spc),  tail ::  head  ::  s)  v0) .  Now,  for  some  u' ,  x' , 

(M(T  v0),  Context!/))  — >  (//',  x ')  and  then  (pc  :  stack-1,  Context!/))  — >  (//',  x')  .  By  the 
induction  hypothesis,  (e\  Context!/'))  — »  (//',  x') ,  i.e.  (s" ,  Context(/'))  — »  (//',  x')  .  It  is 
valid  to  apply  the  induction  hypotheses  because  /  <  k . 

Now  assume  (v,  Context(/))  — »  (it,  x) .  Applying  the  generalized  instance  convergence 
property  with  c  =  methodID  ::  c  ::  s  gives  (s,  Context(/'))  — »  (it,  x)  .  Then,  recalling 
(v',  w)  — »  (s,  s) ,  we  have  (v',  w  ©  Context(/'))  — »  (u,  x) ,  i.e. 

(v',  Context(/  +  1))  — »  (it,  x)  . 

Conversely,  assuming  (v',  Context!/  +  1))  — »  (u,  x) ,  i.e.  (v',  w  ©  Context!/'))  — »  (u,  x) , 
and  knowing  (v',  vk )  — >  (.v,  s ) ,  the  instance  transitivity  property  shows 
(s.  Context!/'))  — »  (u,  x)  .  Applying  the  generalized  instance  convergence  property  with 
c  =  methodID  ::  c  ::  s  gives  (v,  Context!/))  — »  (u,  x)  .  ■ 

Lemma  6-20.  Sub-lemma  of  Lemma  6-19:  For  some  w,  s,  s',  s" ,  c:  e'  — »  s'  (c) , 
s'  (c)  — »  s" ,  s"  (methodID  ::  c  ::  s)  — »  s  and  (v',  w)  — »  (s,  s)  where 
Context!/  +1)  =  w  ®  Context!/') . 

Proof:  The  proof  is  by  a  case  analysis  of  the  transition  introduced  above. 

Case:  The  transition  Sy,  _  ,  Sy,  is  justified  by  the  rule  for  new.  Then  PC(Sy,)  =  pc"  +  1 
and  Context!/  +  1)  =  classID -methodID  :: pc"  ::  Context!/') 

The  constraints  for  new  give 

{  S  pC"  l>tail  Spc"’  SpC"+i  I>head  ^-pc" ,v  ^ classID  ^ pc "  ^pc" , v  )  ^ 

Succ(pc" ,pc"+\,  S'pc",  L pc") 

We  also  have  the  initial  constraints 

{  M methodlmpl  ^ classID-methodID  ^  classID , methodID  ^ classID  ^ methodID  ^ classID methodID  } 

Because  v0  has  a  mapping  in  the  heap,  e'  =  PC(Sy,) :  stack-0  (the  other  expressions 
created  by  new  do  not  have  heap  mappings.).  Now 

PC(S;,) :  stack-0  -»  Spc„  +  !  (head  ::  s>  and  M( Spc„  +  ,)  (head  ::  s>  -»  Tpc„t  v . 
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From  the  program  constraints  and  the  assumption  {MQA.metljod[  j)  >c  v'}  <z  C,  we  get 
(v',  classID-methodID  :: pc"  ::  s)  — »  (5,  s)  for  some  5 ,  where 
M(Tpc„  v)  < methodID  ::  c  ::  s)  — »  5 . 

So  we  set  w  =  classID-methodID  :: pc"  ::  s ,  c  =  head  ::  s ,  s'  =  Spc„  +  ( ,  and 
v"  =  T  „ 

^  1  pc  ,  V  * 

Case:  The  transition  Sy,  _  {  di  Sy,  is  justified  by  the  rule  for  spontaneous  exception  throws. 
Then  PC(Sy,)  =  pc"  and 

Context^  +  1)  =  classID-methodID  ::  err -classID  ::  err -pc"  ::  Context^7). 

The  relevant  initial  constraints  are 

{  Err  ^err -pc"  X- pc "•>  N classID  ^err -classID  -^Dispatch(cte5/D,  methodID)  ^ classID - 

methodID  ^classID, methodID’  N classID  ^ methodID  ^classID, methodID  } 

Thus  (T/(GpC(E  j),  classID-methodID  ::  err  -classID  ::  err-PC(Ey,  _  ::  s)  — »  (.v,  s)  for 
some  x ,  where  M(Xpc„)  ( methodID  ::  globals  ::  s)  — »  x . 

Because  v0  has  a  mapping  in  the  heap,  e'  =  PC(Ef)  :  exn  (the  other  expressions  created 
do  not  have  heap  mappings.).  Now  PC(Ey,) :  exn  — »  X  „  (s>  and  X  „  (s>  — »  X  „  . 

From  the  program  constraints  and  the  assumption  {M(Mmetlwd[  j)  >c  v'}  <z  C,  we  get 
(v',  classID-methodID  ::  err  -classID  ::  err-PC(Ey,  _  ::  s)  — »  (.v,  s)  for  some  .s ,  where 
X  ,,  ( methodID  ::  c  ::  s)  — »  x . 

pc 

So  we  set  w  =  classID-methodID  ::  err  -classID  ::  err-PC(Ey,  _  j)  ::  s ,  c  =  s ,  and 

=  5"  =  Xpc,, .  ■ 

6.7.5.3  Globals  Hypothesis 

Here  we  prove  the  global  variables  “ground  type”  invariant  that  we  used  to  strengthen  the 
induction  hypothesis 

Lemma  6-21.  Consider  the  cases  governing  the  form  of  Context^  +  1) .  For  each  case  we 
show 

(T/(GpC(i+|)),  Context  (A-  +  1))  ->  (M(G(Main  0)),  e) . 

Proof:  The  proof  is  by  a  case  analysis  of  the  form  of  the  transition 

Case:  The  transition  Ek  ^  Ek  +  ,  is  justified  by  the  rule  for  invokestatic. 

Then 

Context^  +  1)  =  PC(Hp  ::  Context^) 

Fet  methodlmpl  =  CodcLocMcthodtPCtF^  +  ,)) .  By  the  induction  hypothesis, 

(A7(GPC(-  )),  Context(Aj)  — »  (G(Main  0),  s)  .  The  invokestatic  instruction  induces  the 
constraints  in  N: 
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{  M methodlmpl  '^PC(Ep  ^PCCEy,  m  ’  ^ methodlmpl  ^globals  GpCXS^ j)  ’ 

TPC(Bf),  m  ^globals  GPC(HA.)  ) 

By  closure  of  C,  {M(  GPC(gt+i))  4PC(^  MGPC&))}  cC. 

Therefore 

(M(GPC(si+i)),  Context  (A-  +  1))  ->  (M(  G(Main  0)),  s) . 

Case:  The  transition  is  justified  by  the  rule  for  invokevirtual. 

Choose  methodID  such  that  InstructionCPQE^))  =  invokevirtual  methodID,  and 
methodlmpl  such  that  PQE,.  +  =  ( methodlmpl ,  0)  .  Set  c  =  globals  ,  /  =  k , 

v'  =  T/(GpC(E  p  and  v  =  M(GPC(E  )) .  The  intial  constraints  contain 

(^methodlmpl  ^globals  GpC®^ ,)’  ^PCgs*),  m  ^globals  GPC(S*)’  ^PC®*),  vO  ^ methodID  GPC(EA)} 

Also,  by  the  induction  hypothesis,  (M(GPC(Sfc)),  Context(A))  — >  (MGfM;un  0)),  s)  . 

Now  we  appeal  to  the  preservation  of  virtual  call  types  (Lemma  6-19)  to  obtain 
(M(GpC(~ +i)),  Context^  +  1))  ->  (M(G(Main  0)),  e) 

Case:  The  transition  is  justified  by  the  rule  for  return. 

Then 

Context^  +  1)  =  (PC(S^+  j)  -  1)-PC(S^+  () ::  Context(CallerState(A')) 

Let  pc  =  PC(HCaMerState(/iT .  The  rule  for  return  implies  PC(Ek+l)  =  pc  +  1 ,  using  an 
application  of  Lemma  6-15  regarding  preservation  of  caller  state. 

By  the  induction  hypothesis,  (M( G  ),  Context(CallerState(£)))  — >  (M(G(Main  0)),  s)  .  The 
method  invocation  instructions  both  induce  the  constraints  Succ(pc,pc+1,  S'pc,  L/J(;),  which 
include 

{  Gpc  +  1  <(PC(SA+  ,)  -  1  )-PC(SA  +  ,)  Gpc  > 

Therefore 

(L/(GPC(-+i)),  Context  (A-  +  1))  ->  (M(G(Main  0)),  e) 

Case:  The  transition  is  justified  by  the  rule  for  exceptional  returns. 

Then 

Context^  +  1)  =  Context(CallerState(£)) 

Let  pc  =  PC(SCallerState(^) .  The  rule  for  exceptional  returns  implies  PC(H^+  j)  =  pc  .  But 
then  M( GPC(Si  ^  =  T/(GPC(E(  ^  ^  ^  ;  applying  the  induction  hypothesis  gives 

(M(GPC(scaiierstate(i))),  Context(CallerS tate (k) ) )  ->  (M(G(Main  0)),  e) 

This  is  identical  to  the  required  result,  taking  the  equalities  into  account. 

Case:  The  transition  is  justified  by  a  rule  for  exception  throws. 
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Then 

Context^  +  1)  =  Context(A') 

The  two  exception  throw  transition  rules  guarantee  PC(Sp  =  PC(S^+  .  Therefore 
applying  the  induction  hypothesis  gives 

(T/(GpC(-  }),  Context^))  -A  (AT(G(Main  0)),  e) 

Case:  All  other  transitions  induce  the  following  rule: 

Context^  +  1)  =  PC(H^)-PC(gA:+  ,) ::  Context(^) 

Let  pc  =  PC(E*)  and  pc'  =  PC(E*  + ,). 

By  the  induction  hypothesis,  (M( G  ),  Context^))  -A  (M(G(Main  0)),  s)  .  The  rules  for 
these  transitions  all  require  the  execution  of  an  instruction  which  induces  the  constraints 
Su cc(pc,pc',  S'pc,  L ,pc)  —  except  for  the  rule  for  exception  catch.  The  exception  catch  rule 
requires  handler  =  CatchBlockOffset {(method,  offset),  HeapObjClass(^(re/)))  where 
pc'  =  ( method ,  handler )  and  pc  =  ( method ,  offset )  for  some  offset .  But  then  the 
constraints  Sue c(pc,pc'.  S' exn_pc_ciassID,  L pc)  are  in  the  initial  constraints.  In  either  case, 

and  therefore 

Gpcf  Context^  +  1))  -A  (M(G(Main  0)),  e)  ■ 

6.7.5.4  Field  Dereferences 

Now  we  prove  Lemma  6-18  for  expressions  e  of  the  form  PC(S^+  j) :  exp  ./. 

The  rules  for  expression  evaluation  require  that  for  some  value  of  ref 

(Et+i,  PC(S*+1)  :  exp )  -*•  ref  and  v  =  HeapObjFieldslHeap^  +  f)(V al(ref)))(f) .  Let p 

be  defined  as 

p  =  min  {/  |  v  =  HcapObjFicldst  Hcap(lE()(  Val(/p/)))(/) } 

Note  that  p  >  0  because  Heap(S0)  is  empty,  and  p  <  k  +  1 .  Therefore 
v  ^  HcapObjFicldsfHcapiE^  _  |)(Val(rc/)))(/) .  Inspection  of  the  tagged  transition  rules 
shows  that  there  are  three  rules  that  could  change  the  mapping  for  Val(/'c/)  from  state 
S  j  to  state  S  :  the  rule  for  new,  the  rule  for  spontaneous  exception  throws,  and  the  rule 
for  put  field.  In  each  case,  the  changed  field(s)  require 

/  e  dom  InitFields(HeapObjClass(^(Val(re/)))) . 

Let  pc  =  PC(Ep_l). 

We  can  use  the  induction  hypothesis  to  obtain 

Vv',  u’ .  (Ek+  ,,  PC(Ek+  ,) :  exp)  -*•  v’  a  (PC(Ej(+  ,) :  exp.  Context^  +  1))  A  (//,  x')  => 
3/',  e'.  Creation(v')  =  (/',  e')  a  /'  <  k  +  1  a  ( e’ ,  Context(/'))  -a  (V,  x') 
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We  have  (S^+  {,  PCtS,.  +  ,) :  ex/?)  “■*  ref.  Also, 

(PC(E£  +  j) :  exp  .f  Context!/:  +  1))  — »  (it,  x)  requires,  for  some  t,  c,  t' , 

PC(S£+  j) :  exp  ./  — »  t(c  ©  (/::  s)>  where  PC(S^+  j) :  exp  — »  /(c)  , 

M(0  (c  ©  (/::  s)>  — »  f  ,  and  (/',  Context!/-  +  1))  — »  (//,  x) . 

By  Lemma  6-6,  there  exists  t"  such  that  M(t)  (c)  — »  f  and  t"  (f ::  s>  — »  f ,  i.e. 

{/"  >y/}cC.  By  Lemma  6-2,  there  exist  //"  ,  x"  such  that 

(/",  Context!/-  +  1))  — »  (//",  x")  .  Thus  PC(S^+  ,) :  exp  — »  (//",  x")  and 

Creation  (/*e/)  =  (/',  e')  a  /'  <  k  +  1  a  (e',  Context!/'))  — »  (if”,  x")  for  some  /',  e’ . 

The  rest  of  the  induction  hypothesis  is  proven  using  a  case  split  on  the  form  of  the  transition 

— p  -  i  —  —p 

Case:  E  _  j  ^  E  is  justified  by  the  rule  for  new  classID,  where 
classID  =  H c ap O bj  Class(»?(  V a \( ref))) . 

Then  (Ep,  PC(Ep  :  stack-0)  ref  and  (Ep,  PC(E^)  :stack-0./)  v ,  giving 
Creation(v)  =  (p,  PC(E  ) :  stack-0  .f)  and  Creation  (ref)  =  (p,  PC(E  ) :  stack-0) 
by  definition. 

It  remains  to  be  shown  that  (PC(Ep  :  stack- 0  .f  Context!/?))  — »  (it,  x)  .  From  above,  we 
have  Creation  (ref)  =  (/',  e')  a  /'</-  +  1  a  (e'.  Context!/'))  — »  (//",  x")  for  some  /',  e' . 
But  because  Creation  is  a  function  (Lemma  6-12),  we  have  /'  =  p  and 
e’  =  PC(E  )  :  stack-0  ,  giving  (PC(E  )  :  stack-0,  Context!/?))  — »  (//",  x")  .  Therefore 
for  some  s,  {M(S  +  j)  >head  s}  e  C  and  (s.  Context!/?))  -»  (//",  x")  . 

The  new  instruction  induces  these  constraints  in  N : 


{  ^  pc  ^tail  ^pci  ^ pc  +  1  ^head  ^ pc,  v  ’  ^ classID  ^ pc  ^ pc,  v  } 

^  {  Vv  >fTpcj}  K-J  Succ(/?c,/?c+l,  S'pc,  Lpc) 

These  imply  {  +  !  >head  T/)c,  v,  T/)c,  v  >f  Tpc  f]  c  N,  which  in  turn  imply 

impc  +  i)  >head  MTf,  v),  M(TpCi  v)  >fM( TpcJ)  }cC.  Clearly 

PC(Sp  :  stack-0  ./-»  +  !  (head  s> 

M(Spc  +  ,)  (head  s>  -+M( TpcJ 

Thus  all  that  remains  to  be  proved  is  (M( T  f,  Context!/?))  — »  (u,  x)  . 

The  facts  {M(Spc+l)  >head  s}  c  C  and 

!  V/(S/)r+l)  >headMT pc,y),M(JPc,y)  >/MTFi/)}cC  give  s  =  M(Tpc  y)  and 
(s  >yM( T  7)  j  ::  ( ’ .  Above  we  showed  (x,  Context!/?))  — »  (//",  x") , 

(/',  Context!/-  +  1))  — »  (a,  x) ,  (/",  Context!/-  +  1))  — »  (//",  x") ,  and  {/"  Now 

we  can  invoke  the  instance  convergence  property  (Lemma  6-8)  to  obtain  the  required 

(M(Jpc  J,  Context!/?))  -»  (a,  x) 

Case:  S  _  j  S  is  justified  by  the  rule  for  put  field  f 


140 


Then  (Sp_1,  PC(E^_  j)  :  stack-1)  ref  and  (Sp_1,  PC(E/;  _  , ) :  stack- 0)  v.  We 
show  that  (PC(S^  _  j) :  stack- 0,  Context!/;  -  1))  -A  (it,  x)  ;  the  main  result  then  follows 
immediately  by  appealing  to  the  induction  hypothesis. 

The  put  field  instruction  induces  these  constraints  in  N: 


{  SpC  I>tail  Tpcfi  ^pc  ^head  ^ pc, \r>  ^ pc,t  ^tail  ^  pc  ^-pc,  t  ^head  Tpc,obj’  Tpc,obj  ^f^pcpi  }  ^ 

Succ(pc, pc+\,  S'pc,  Lpc) 

Clearly  then,  PC(E  j)  :stack-l— »  S^,c(tail ::  head  ::  s>  and 
A7(S/)C)  (tail  ::  head  ::  s>  -A  MT/J£,  ob|) .  Therefore  for  some  r ,  z,  we  have 
(M  V,  obj),  Context!/;  -  1))  -A  (r,  z) ,  and 

(PC(E^  _  j) :  stack- 1,  Context!/;  -  1))  -A  (/',  z) .  By  the  induction  hypothesis, 

3i”,  e" .  Creation  {ref)  =  (/",  e")  a  /"  <  i  a  ( e ",  Context!/"))  -a  (r,  z) .  But  then  /"  =  f 
and  e"  =  e’ ,  and  indeed  r  =  it" ,  z  =  x"  . 


Let  x  =  M(T^  obj) .  Then  {s  >fM(Tpc^  v)}cC  and  (s.  Context!/;  -  1))  — >  (//",  x")  . 
From  the  preamble  to  this  section  (6. 7. 5. 4),  (/',  Context^  +  1))  -a  (it,  x) , 

(/",  Context^  +  1))  -A  (it”,  x")  ,  and  {/"  [>/■ t’ }  cf.  The  instance  convergence  property 
gives  (M( T  v),  Context!/;  -  1))  -A  (it,  x)  .  The  putf  ield  constraints  show 
PC(E/;  _  |) :  Stack-0  -A  Spc  (head  ::  s>  and M(Spc)  (head  ::  s>  -A  M(Tpc^  v) .  Putting  these 
together  gives 


PC(E  _  j) :  stack- 0  — >  (it,  x) 

Case:  E  j  ^  E  is  justified  by  the  rule  for  spontaneous  exception  throw. 

Then  (E/;,  PC(E/;) :  exn)  -*•  ref  and  (E/;,  PC(E/;) :  exn  .f)  -*•  v ,  giving 
Creation(v)  =  (p ,  PC(E  )  :  exn ./)  and  Creation(/r/)  =  (p ,  PC(E  ) :  exn)  by 
definition. 


It  remains  to  be  shown  that  (PC(E  ) :  exn  .f  Context!/;))  -a  (u,  x)  .  From  above,  we  have 
Creation!/^/)  =  (/',  e')  a  /'  <  k  +  1  a  (e' ,  Context!/'))  -A  (it”,  x")  for  some  /',  e' .  But 
because  Creation  is  a  function  (Lemma  6-12),  we  have  /'  =  p  and  e'  =  PC(E  )  :  exn  , 
giving  (PC(E  )  :  exn,  Context!/;))  -A  (it",  x")  .  Therefore 
(M(  Wpc),  Context  ip))  -A  (//",  x") . 

The  initial  constraints  require  of  N : 

{  Eir  ^err_^,c  W pc,  W pc  ^exn -pc  ^-pc  )  ^  {  ^classID  ^err -classID  ^ 

{  classID  ^f^classIDf  } 

Therefore  for  some  some  s' ,  {M(Wpc)  \>j  s' }  cf.  Clearly  PC(  Ep)-.exn.f^Wpc(f::  s> 
and  M( W  )  (/* ::  s>  — >  .  Thus  all  that  remains  to  be  proved  is  (s',  Context!/;))  -A  (u,  x)  . 

To  recap,  I  have  {M(Wpc)  >^'}cC,  (M(Wpc),  Context!/;))  -A  (it”,  x") , 

(/',  Context!^  +  1))  -A  (it,  x) ,  (/",  Context!^  +  1))  -A  (it”,  x") ,  and  {/"  Now 
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I  can  invoke  the  instance  convergence  property  (Lemma  6-8)  to  obtain 
(s',  Context)/?))  — »  (it,  x) ,  as  required. 

6.7.5.5  Static  Field  Expressions 

Suppose  e  is  of  the  form  PC(S^+  ,) :  staticField .  Then  the  rules  for  expression  evaluation 
require  v  =  GlobalstE/,  +  j)(. staticField) .  We  also  have  the  assumption 
(PC(E^+  |) :  staticField,  Context)^  +  1))  — »  (u,  x) ,  implying  for  some  t , 

PC(E/-  +  |) :  staticField  — »  CPC(E  ^  ( staticField ::  s)  ,  AT(GpC(E  ^  (staticField ::  s)  — >  / , 
and  (t.  Context^  +  1))  — »  (it,  x)  . 

We  have  already  proven  that  (M(GPC(Si  p,  Context^  +  1))  — »  (M(G(Main  0)),  s)  .  Then 
by  the  component  propagation  property, 

3v'.  (t.  Context)*  +  1))  ->  (v' ,  e)  a  !.V/(GlN1ain  0))  >staticField  v'}  c=  C 

This  implies  n  =  v'  and  x  =  s . 

Let  p  be  defined  as 

p  =  min  {/  |  v  =  G 1  o  b  a  1  s  ( E,) (static I' ie Id)) 

Clearly  0  <p  <  k  +  1 . 

If  p  =  0  then,  by  the  definition  of  Creation  and  the  initial  state  Eo , 

Creation(v)  =  ((Main,  0)  :  staticField)  .  Now 

((Main,  0)  :  staticField)  — >  T/(G(  Main  0))  (staticField ::  s)  , 

AT(G(Main  o))  (staticField ::  s>  — >  v’  and  (v',  s)  — >  (V,  s)  ;  therefore,  as  required, 

((Main,  0)  :  staticField,  Context(O))  — >  (v\  s) 

Suppose  p  >  0  .  Then  Globals(E/;  _  ^(staticField)  A  v .  The  only  transition  which  can 
change  the  mapping  of  ^  is  the  execution  of  a  putstatic  staticField  instruction.  The 
rule  for  that  instruction  requires  E^  _  \  =  [pc:  pc,  wstack:  v  ::  S,  p]  for  some  pc,  p,  S. 
Therefore  (E/;  | , pc  :  stack-0)  v . 

This  instruction  induces  the  constraints 

{  Spc  ^tail  ^  pc  'Spc  ^head  Tpc,v>  ^ pc  ^ fieldID  ^pc,x  ) 

Therefore  pc  :  stack-0  — >  Spc  (head  ::  s>  and  M( Spc)  (head  ::  s>  — >  M(Tpc  v) . 

Applying  the  induction  hypothesis  gives  (M( G  ),  Context)/?  -  1))  — >  (M)G(Main  0)),  s) . 

Then  applying  the  component  propagation  property  with 

{M(Gpc) 

^ staticField  MV,v)}cC  gives 

(MTFi  v),  Context  ip  -  1))  ->  (v",  s)  a  {M(G{M^  0))  >staticField  v"}  c=  C 

Therefore  v”  =  v' .  Combining  the  above  gives  (pc  :  stack-0,  Context)/?  -  l))^-(v',s). 
Now  we  appeal  to  the  induction  hypothesis  at  p  -  1  to  directly  obtain  the  required  result. 
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6.7.5.6  Cases  For  Simple  Expressions 

The  remaining  cases  prove  the  induction  result  for  the  simple  expressions  of  the  form 
stack-w,  local-/??  and  exn ,  for  each  form  of  transition.  The  rest  of  this  chapter  proves 
those  cases,  ordered  by  the  form  of  the  transition.  For  most  instructions,  the  strategy  is  to 
map  the  expression  evaluated  after  transition  to  an  expression  evaulated  before  transition, 
and  show  that  their  values  are  the  same  and  their  types  are  suitably  related. 

6.7.5.7  Reduction  Function 

For  each  case,  I  define  a  partial  function  R  :  BExpRoot  >— »  BExpRoot  satisfying  the 
following  conditions: 

Vexp,  v.  (Ek  +  l,  PC(Ek+l)  :  exp )  -*•  v  =k>  (SA,  PC(EA)  :  R(ex/?))  -*•  v 
P  exp,  u,  x. 

(PC(SA  +  j)  :  exp.  Context^  +  1))  -A  (?/,  x)  =>  (PC(SA) :  exp.  Context^))  -A  (?/,  x) 

For  those  exp  on  which  R  is  defined,  we  immediately  obtain  (SA,  PC(SA)  :  R(ex/?j)  v  and 
( PC(E/f)  :  exp,  Context(^))  -a  (?/,  x)  ;  the  required  result  follows  immediately  from  the 
induction  hypothesis. 

In  all  the  cases,  we  set  pc  =  PC(SA) . 


6.7.5.8  Succession  Lemma 

Lemma  6-22.  This  lemma  is  very  helpful  for  showing  the  preservation  of  types  during 
normal  control  flow.  It  states  that  if  an  instruction  does  not  modify  the  value  of  a  stack 
variable  or  local  variable  (implying  that  it  only  transfers  control  within  the  current 
method),  then  the  type  is  preserved. 

Vex/?,  /,  S’,  L' .  (PC(SA  +  j)  :  exp,  PC(Ep-PC(Ek  +  j)  ::  Context(/))  (?/,  x) 

A  exp  ^  exn  A  Succ(PC(Sp,  PC(Ea  +  S’,  L')  c=  N 

3 1,  c,  s,  f .  PC(SA+  j) :  exp  — >  t(c >  a  s(c)  f  a  (f,  Context(/))  — >  (?/,  x) 
a  s  =  M(F(exp,  S',  /.')) 

Here  F  is  defined  as  follows: 

F(stack -m.  S’,  L’)  =  S'pc 

F(local -m,S’,L’)  =  L’pc 

Note  that  F  is  not  defined  for  the  expression  exn;  the  expression  exp  can  only  be  exn  when 
the  abstract  machine  is  in  exception-handling  mode. 

Proof:  By  definition,  (PC(SA+  ,) :  exp,  PC(Ej)-PC(Ek  +  j)  ::  Context^))  — >  (?/,  x) 
requires  PC(SA+  ,) :  exp  >  /  ( c) ,  M(t )  (c)  — »  t”  and 
( t ”,  PC(Ej)-PC(Ek+  |)  ::  Context(^))  — >  (?/,  x)  for  some  t,  c,  t"  . 
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Consider  the  two  cases  for  exp\  we  show  that  in  both  cases,  {t  ^PC(E  ,  PC(E  )  s}  c:  N 
where  5  =  IA(¥(exp,  S'pc,  L’pc))  . 


!SPC(H,.,)  <PC(S,)-PC(S,+  1 


1}  y}eSucc(PCG 


Case:  exp  =  local -m  .  Then  t  =  L  and  5  =  M(L’ ) .  We  have 
{Lpc(Hi+1)  ^PC(§.)-PC(E,  +  |)  ^  Suce(PC(r./),  PC(^+1),  S',  I') 


Now  by  the  instance  propagation  property  (Section  6-10),  there  exists  t'  such  that 
s(c >  -A 1'  and  {t"  ^PC(E  ppc(E,  j  /' }  a  C .  This  implies  (i',  Context(^))  -A  (?/,  x) ,  as 
required.  ■ 


6.7.5.9  Step:  load  rule 
The  rule  for  load  gives 

Instruction  (pc)  =  load  index 

S ^  =  [pc:  pc,  wstack:  S,  locals:  A,  p] 

S j  =  [pc:  pc  +  1,  wstack:  £( index )  ::  S,  locals:  A,  p] . 

The  function  R  is: 

R(stack-w)  =  stack-(/w  —  1)  777  >  0 

R(  stack-0)  =  local-7>/c/cx 

R(local-7?)  =  local-77 


Now  consider  the  different  cases  for  exp.  Because  R  is  defined  for  all  stack-777  and 
local-7? ,  this  proof  suffices  to  guarantee  the  induction  hypothesis.  Note  that  exp  cannot 
be  exn  since  the  machine  is  in  state  RUNNING. 

N  contains  the  constraints 

{  Lpc  ^ index  ^ pc ,v>  ^  pc  ^tail  ^  pc  ^head  Tpc,v  }  ^  Succ(pc,  pc+1,  S  pC ,  L pc) 

We  also  have  Context^  +  1)  =  pc -(pc  +  1)  ::  Context!?!)  and  therefore 
(PC(S^+  j) :  exp,  pc -(pc  +  1)  ::  Context!?!))  -a  (?/,  x)  .  This  implies  that 

3 1,  c,  s,  f .  PC(S^+  j) :  exp  -A  t(c)  As(c)->f  a  (f ,  Context^))  -A  (?/,  x) 
a  x  =  M(F(exp,  S'pc,  Lpc,  Gpc)) 

Case:  exp  =  stack-777 ,  m  >  0 .  Then  R(exp)  =  stack-(7?7  -  1) . 

The  evaluation  rules  show  J (index) ::  S  is  of  the  form  v0  ::  ...  ::  vm  ::  S'  where  vm  =  v . 
Therefore  S  =  ::  ...  ::  vm  ::  S  and  (S k,pc  :  stack-(?w  -  1))  vm  =  v,  as  required. 

In  this  case  we  apply  the  succession  lemma  (6-22)  with  t  =  S  +  ,  and 
c  =  tail ::  ...  ::  tail ::  head  ::  s,  with  m  occurrences  of  “tail”.  Also,  v  =  M(S'  ) .  Therefore 

pc 

M(S'F)(c)  -a  f  where  (f,  Context^))  — >  (?/,  x)  ;  this  implies  M( S  )  (c'>  — >  f  ,  where 
c  =  tail ::  c' .  The  sequence  c'  has  m  -  1  tails,  therefore 
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PC(S^)  :  stack- (777  -  1)  -^M(Spc  +  j)  (c')  .  All  together  then, 

(PC(  ’E.j.) :  stack- (/??  -  1),  Context(^))  -A  (//,  x)  as  required. 

Case:  exp  =  stack-0 .  Then  R(cxp)  =  local -index. 

The  evaluation  rules  show  J  (index) ::  S  is  of  the  form  v0  ::  ...  ::  vm  ::  S'  where 

v0  =  v  =  J_(  index) .  Therefore  CE.k,  pc  -.local-  index)  J(  index)  =  v ,  as  required. 

In  this  case  t  =  S  +  x  and  c  =  head  ::  s  .  Also,  .s  =  M( S'  ) .  Therefore 
M(S'pc)(c)  — »M( Tpc  v) ,  i.e.  f  =  M( T  v) .  This,  plus  the  constraints  in  A,  implies 
M(hpc)  {index)  — »  t’ .  Also,  PC(S^)  :  local -index  — »M( Cpc)  { index  ::  s)  ;  all  together 
then,  (PC(S^)  :  local- index,  Context(^))  — »  (//,  x)  as  required. 

Case:  exp  =  local-77 .  Then  R(cxp)  =  local-7?. 

The  evaluation  rules  show  J(n)  =  v .  Therefore  (Ek,pc  :  local -7?)  j?(n)  =  v ,  as 

required. 

In  this  case  l  =  L  +  ,  and  c  =  n  ::  s  .  Also,  .s  =  M(L  ) .  Therefore  M( L  )  (c)  — »  ^  . 
Also,  PC(5/r)  :  local -7?  — »M(L  )  (c)  ;  all  together  then, 

(PC(H^) :  local-77,  Context(^))  -A  (?/,  x)  as  required. 

6.7.5.10  Induction  Step:  store  rule 
The  rule  for  store  gives 

Instruction  (pc)  =  store  index 

=  [pc:  pc,  wstack:  v'  ::  S,  locals:  A,  p] 

E k  +  1  =  [pc:  pc  +  1,  wstack:  S,  locals:  index:  v'],  p]  . 

The  function  R  is: 

i?(stack-77?)  =  stack- (777  +  1  ) 

R(1  ocal-index)  =  stack-0 

R(1  ocal-7?)  =  local-7?  n A  index 

Now  consider  the  different  cases  for  exp.  Because  R  is  defined  for  all  BExpRoots  other  than 
exn,  this  proof  suffices  to  guarantee  the  induction  hypothesis. 

N  contains  the  constraints 

{  Spc  ^tail  S  pC »  S pC  !>head  Tpc,V’  ^  pc  ^ index  ^-pc,v  }  ^ 

{  \J pc  \>j  Tpc  ,1/e  LocalNameslpc)  a  / A  index  }  to 

{  >7  Tpc,  1 1 1  e  LocalNames(pc)  a  i A  index  }  to  Succ(pc,  pc+ 1 ,  S'pc,  Cpc) 

We  also  have  Context!?!  +  1)  =  pc-(pc  +  1)  ::  Context!?!)  and  therefore 
(PC(Ek  +  j) :  exp,  pc -(pc  +  1)  ::  Context!?!))  -a  (?/,  x)  .  This  implies  that 

3 1,  c,  s,  f .  PC(E^+  j) :  exp  — >  t(c)  a  s(c)  -A-  f  a  (f.  Context^))  — >  (?/,  x) 
a  x  =  M(F(exp,  S’pc,  Lpc,  Gpc)) 
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Case:  exp  =  stack-/w  .  Then  R(cxp)  =  stack-(/w  +  1)  . 

The  evaluation  rules  show  S  is  of  the  form  v0  ::  ...  ::  vm  ::  S'  where  vm  =  v .  Therefore 


StackfE,.)  =  v' ::  v0  ::  ...  ::  vm  ::  S'  and  (E k,pc  :  stack-(/w  +  1))  vm  =  v,  as  required. 

In  this  case  I  apply  the  succession  lemma  (6-22)  with  t  =  S  +  j  and 
c  =  tail  ::  ...  ::  tail ::  head  ::  s ,  with  m  occurrences  of  “tail”.  Also,  5  =  M(  S'  )  .Therefore 

pc 


M(S'pc)  (c)  — »  f  ;  this  implies  M( Spc)  ( c' )  — »  f ,  where  c'  =  tail ::  c  .  The  sequence  c  has 
m  +  1  tails,  therefore  PC(5/r)  :  stack-(»7  +  1)  Spc  +  ,)  < c' >  .  All  together  then, 

(PC(  S^.)  :  stack- (m  +1),  Context)/:))  (it,  x)  as  required. 

Case:  exp  =  local-index .  Then  R(ex/?)  =  stack-0. 

The  evaluation  rules  show  v’  =  v.  Therefore  (S ^pc  :  stack-0)  V  =  v,  as  required. 

I  apply  the  succession  lemma  (6-22)  with  t  =  Cpc  +  (  and  c  =  index  ::  s  .  Also, 
x  =  M(L'  ) .  Therefore  M(U  )  (c)  —>  f  ,  i.e.  f  =  M( T  v) .  This,  plus  the  constraints  in 
N,  implies  M( Spc)  (head ::  b>  — >  .  Also,  PC(E^)  : pc  :  stack-  0  — »  M(S  )  ( head ::  s>  ;  all 
together  then,  ( PC(E/-)  : pc  :  stack-0,  Context)/:))  (it,  x)  as  required. 

Case:  exp  =  local-w  ,  where  n  ^  index .  Then  R(cx/;)  =  local-w  . 

The  evaluation  rules  show  J(n)  =  v .  Therefore  (E k,pc  :  local-//)  j?(n)  =  v,  as 
required. 

In  this  case  t  =  L  +  ,  and  c  =  n  ::  s  .  Also,  .s  =  M( L'  ) .  Therefore  .\  /(L'/)r)  (c)  l' 
and  t'  =  M( T  H)  .  This,  plus  the  constraints  in  A,  implies  M(L  )  (n  ::  s>  t'  Also, 
PC(Ea.)  :  local-//  ~^M( L  )  (// ::  s>  ;  all  together  then, 

(PC(E^) :  local-//,  Context)/:))  -o-  (//,  x)  as  required. 

6.7.5.11  Induction  Step:  new  rule 

The  rule  for  new  gives 

Instruction  (pc)  =  new  classID 
E ^  =  [pc:  pc,  wstack:  S,  locals:  A,  p] 

E jc+  j  =  [pc:  pc  +  1,  wstack:  ref ::  S,  locals:  A,  p] 

The  function  R  is: 

R(stack-m)  =  stack-(/w  —  1)  m>  0 

i?(stack-0 )  is  undefined 

i?(local-//)  =  local-// 

For  the  expressions  on  which  R  is  defined,  the  proof  of  R’s  correctness  is  identical  to  the 

cases  for  load,  and  is  not  repeated  here. 

For  exp  =  stack-0  ,  Creation(v)  =  (k  +  1,  (pc  +  1)  :  stack-0)  by  the  definition  of 
Creation;  thus  the  induction  result  is  trivially  satisfied. 
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6.7.5.12  Induction  Step:  aconst  null  rule 

The  proof  for  this  case  is  the  same  as  for  the  new  rule. 

6.7.5.13  Induction  Step:  bipush  rule 

The  proof  for  this  case  is  the  same  as  for  the  new  rule. 

6.7.5.14  Induction  Step:  rule  for  spontaneous  exception  throw 

The  rule  for  spontaneous  exception  throw  gives 

classID  e  ErrorClassIDs 

S j,  =  [mode:  RUNNING,  pc:  pc,  wstack:  S,  locals:  A,  p] 

S^+  j  =  [mode:  THROWING,  pc:  pc,  wstack:  ref  ::  S,  locals:  A,  p]  . 

Furthermore  Context(£)  =  Context^  +  1) . 

The  function  R  is: 

i?(stack-/w) 
i?(exn) 
i?(local-//) 

Case:  exp  =  stack -m  . 

This  case  cannot  occur  because  stack  expressions  do  not  evaluate  to  anything  in  the 
Throwing  state. 

Case:  exp  =  local-?/ .  Then  R(exp)  =  local-//. 

The  evaluation  rules  show  A0>)  =  v.  Therefore  (S k,pc  :  local-//)  A(>>)  =  v. 
Furthermore,  since  PC(F/f)  =  pc  =  PC(S^+1)  and  Context(A)  =  Context^  +  1) , 
(PC(S^) :  1  ocal-B,  Context(^))  — »  (//,  x)  .  The  result  then  follows  from  the  induction 
hypothesis. 

Case:  exp  =  exn . 

R  is  undefined  for  pc  :  exn  .  However  (S^+  x,pc  :  exn)  v  implies 
Creation(v)  =  (k  +  l, pc  :  exn)  ;  thus  the  induction  result  is  trivially  satisfied. 

6.7.5.15  Induction  Step:  invokestatic  rule 
The  rule  for  invokestatic  gives 

Instruction  (pc)  =  invokestatic  methodlmpl 
S ^  =  [pc:  pc,  wstack:  V’j  ::  Vq  ::  S,  locals:  A ,  mstack:  ([,  p] 

S £+  j  =  [pc:  pc' ,  wstack:  S,  locals:  [0:  Vq,  1:  Vj],  mstack:  (pc,  S,A)  fL  p] 
pc'  =  ( methodlmpl ,  0) 

Furthermore,  Context^  +  1)  =  pc  ::  Context!/!) .  The  induced  constraints  include 


is  undefined 
is  undefined 

=  local-// 
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{  Spc  ^tail  V,tl,  SpC  I>head  ^ pc, w  1  ’  ^pc, tl  ^tail  ^/)c,t2’  I  ^head  ^/>c,v 0’ 
^ methodlmpl  ^ pc  ^pc,vtn  ^-pc,  m  ^param-0  ^/>c,  vO  ’  ^/>c,  m  ^param-1  ^ pc,  vl  ! 

The  initial  constraints  also  contain 


{M 

'‘pc' 

The  function  R  is: 


methodlmpl  ^param-0  ^ methodlmpl ,  p0’  ^ methodlmpl  ^param-1  ^ methodlmpl ,  pi’ 
L„„>  [>q  ^methodlmpl,  p0’  ^pc'  ^1  ^methodlmpl,  pl^ 


i?(local-//)  =  stack-(l  —  //)  0  <  //  <  1 

R(exp)  is  undefined  otherwise 


Cas e:  exp  =  stack-/w  .  This  case  cannot  occur  because  WStack(S£+  j)  =  s. 

Case:  ex/?  =  local-// .  Then  R(exp)  =  stack-  (!-«)• 

In  this  case  //  must  be  0  or  1  and  v  =  vn  .  Then  the  evaluation  rules  show  that 
CBtp, pc  :  stack-(l  -  //))  ™*-  v}]  =  v. 

Now,  ( pc' :  local-//,  Context^  +  1))  — »  (//,  x)  implies  that 
(MT methodlmpl,  p Pc  ::  Context^))  ->  (//,  x)  .  Combining  this  with 

{AT(M  nigthodlmpl)  ^  param-;;  M(^T methodlmpl,  p ;;  ),  M(  M  methodlmpl)  pc  ,  m^’ 

MTF,m)>param.0MTF,VII)}^ 

gives  {M(Tme^od/mp/  p„)  ^/)C  MTFi  v„)}  c=  C .  Therefore 
(MT/)C,  v„),  Context^))  (//,  x)  . 

If  //  =  0  then  pc:  stack-(l  -//)  — »  S^,c(tail ::  head  ::  s>  and 
M(Spc )  (tail ::  head  ::  s>  -» M(Tpc ,  v0) .  Otherwise  //  =  1 , 

pc  :  stack-(  1  -  //)  — »  S  (head  ::  s>  and  M( Spc)  (head  ::  s>  ~^M( T  vl) .  Either  way, 
(pc  :  stack- (1  -  //),  Context!/!))  — »  (//,  x)  .  The  result  then  follows  directly  from  the 
induction  hypothesis. 


6.7.5.16  Induction  Step:  invokevirtual  rule 
The  rule  for  invokevirtual  gives 

Instruction  (pc)  =  invokevirtual  methodID 
S h  =  [pc:  pc,  wstack:  Vj  ::  Vq  ::  S,  locals:  A,  mstack:  p] 

S h+  j  =  [pc:  pc' ,  wstack:  8,  locals:  [0:  Vq,  1:  Vj],  mstack:  (pc,  S,^_)  p] 

where  pc’  =  (methodlmpl,  0)  . 

The  induced  constraints  include 


{  s, 


5 pc  ^tail  ^pc, tl’  S pc  ^head  ^ pc, v  1  >  Tpc,tl  ^tail  Tpc,t2’  ^pc,t\  ^head  Tpc.vO’  ^pc.vO 
^ methodID  ^pc, m’  ^  pc  ^tail  ^ pr-  ^  T"-  -  T„_  I> - -  n  T„„  „n  . 

""  rr  1 


L  |/t,Vi'  ^1/0,11  laii 

^pc, t2>  S  pC  I>head  ^ pc,p  pc,  m  ^param-0  ^ 

^ pc,  m  ^ param- 1  ^ pc,  vl  ) 
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The  initial  constraints  also  contain 


{M 

^ pc ' 

The  function  R  is: 


methodlmpl  ^param-0  ^ methodlmpl,  pO’  ^ methodlmpl  ^param-1  ^ methodlmpl,  pi’ 
[>q  ^methodlmpl,  pO’  ^pc'  ^1  ^methodlmpl,  pl^ 


i?(local-«) 

R(exp) 


=  stack-(l  —  ft) 
is  undefined 


0  <  n  <  1 
otherwise 


Case:  exp  =  stack-w  .  This  case  cannot  occur  because  WStack(S£+  j)  =  s. 

Cas e:  exp  =  local-77 .  Then  R(cxp)  =  stack-  (l-n). 

In  this  case  n  must  be  0  or  1  and  v  =  vn  .  Then  the  evaluation  rules  show  that 
(S^., pc  :  stack-(l  -  7?))  ™*-  v}]  =  v. 

Now,  ( pc' :  local-7?,  Context^  +  1))  — »  (?/,  x)  implies  that 

(M(T methodlmpl,  p Context^  +  1))  ->  (?/,  x)  .  Apply  the  preservation  of  virtual  call  types 
lemma,  setting  c  =  paranw/,  v  =  Tpc  wn  and  v'  =  Tmethodlmpl  p„ ,  giving 
(M  V,  VII),  Context^))  -»  (it,  x)  . 

If  n  =  0  then  pc:  stack-(l  -??)  — »  S  < tail ::  head  ::  s>  and 
M(Spc )  (tail ::  head  ::  s>  >  ,V/(T/)r  v0) .  Otherwise  n  =  1 , 

pc  :  stack-(  1  -  n)  — »  S^c(head  ::  s>  and  M( S  )  (head  ::  s>  ~^M(T  vl) .  Either  way, 
(pc  :  stack- (1  -  it),  Context(£))  — »  (n,  x)  .  The  result  then  follows  directly  from  the 
induction  hypothesis. 

6.7.5.17  Induction  Step:  return  rule 
The  rule  for  return  gives 

Instruction  (pc)  =  return 

=  [pc:  pc,  wstack:  v'  ::  S,  locals:  A,  mstack:  (pc" ,  S’,  41)  p] 

E^+  j  =  [pc:  pc"  +  1,  wstack:  v'  ::  S',  locals:  ^ ,  mstack:  p] 

Let  c  =  CallerState(&)  and  pc’  =  PC(Sc) .  The  transition  Ec  dt  Ec  +  x  must  be  an  appli¬ 
cation  of  invokestatic  or  invokevirtual,  because  only  those  rules  extend 
Therefore  Instruction(pc')  =  invokevirtual  methodID  or 
Instruction(pc')  =  invokestatic  methodlmpl .  In  the  latter  case, 
methodlmpl  =  CodeLocMethod(PC(Ec  +  })) ;  in  the  former  case,  define 
methodlmpl  =  CodeLocMethod(PC(Ec  +  })) . 

In  either  case,  N  contains  the  constraints 

{  SpC'  I>tail  ( /x  ',tl  ’  Spc'  I>head  (  /x  '.vb  ^pc’, tl  ^tail  ^pc', t2>  pc' X\  ^head  Tpc',vO 
^  pc'  ^tail  ^ pc' , t2’  ^  pc'  ^head  ^pc', r  }  '-j  MethodCall(T^c m,  TpC ,vq,  T^,c  vj,  Gpc,  W^(;, 

T pCf)  ^  Succ (pc',  pc'  +  1,  S’pc,,  Lpc,) 
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Note  also  that  Context^  +  1)  =  pc’ -(pc’  +  1)  ::  Context(c) . 

By  the  lemma  governing  preservation  of  caller  state  (Lemma  6-15), 

Sc  =  [pc:  pc",  wstack:  v'j  ::  v'0  ::  S',  locals:  £,  mstack:  p] .  This  implies  pc’  =  pc"  . 

Case:  exp  =  local-//  for  some  //. 

Then  v  =  £(n) ,  and  therefore  (Ec,  pc’ :  exp)  v . 

In  this  case  I  apply  the  succession  lemma  (6-22)  at  j  =  c  with  t  =  Lpc,  +  (  and  c  =  //  ::  s  . 
Also,  x  =  M(Lpc,) .  Therefore  M(Lpc,)  (c)  -A  f  .  Also,  PC(Ec) :  local-//  -^M(Lpc,){c)  ; 
all  together  then,  (PC(Sc) :  local-//,  Context(c))  -a  (//,  x) .  Applying  the  induction 
hypothesis  setting  i  =  c  gives  the  required  result. 

Case:  exp  =  stack-///  for  some  m  >  0 . 

The  evaluation  rules  show  v’  ::  S’  is  of  the  form  v0  ::  ...  ::  vm  ::  S"  where  vm  =  v . 
Therefore  S'  =  Vj  ::  ...  ::  vm  ::  S"  .  Now 

MStaek(Sc)  =  v'j  ::  v'0  ::  S’  =  v'j  ::  v'0  ::  Vj  ::  ...  ::  vm  ::  S"  ;  therefore 
(Sc,  pc’ :  stack -(///  +  1))  vm  =  v . 

We  apply  the  succession  lemma  (6-22)  at  j  =  c  with  t  =  Spc,  +  j  and  and 
c  =  tail ::  ...  ::  tail ::  head  ::  s ,  with  m  occurrences  of  “tail”.  Also,  s  =  M(S’,)  . 

pc 

Therefore  M( S’  ,)  (c)  t’ .  This  implies  M(S  ,)  (c')  —>  f ,  where  c’  =  tail ::  c  . 
Therefore  PC(Sc) :  stack -(///  +  1)  ~^M( S  ,)  (c')  .  All  together  then, 

(PC(  Sc)  :  stack- (m  +  1),  Context(c))  (//,  x)  . 

Applying  the  induction  hypothesis  setting  /  =  c  gives  the  required  result. 

Case:  exp  =  stack-0  . 

Then  v  =  v’ ,  and  therefore  (S hpc  :  stack-0)  v .  I  will  prove  that 
(pc  :  stack-0,  Context!/!))  (//,  x)  ;  the  correctness  of  this  case  then  follows  immedi¬ 
ately  using  the  induction  hypothesis. 

From  (PC(H^+  j) :  stack-0,  Context^  +  1))  (//,  x)  and  the  induced  constraints,  it 

follows  that  PC(S^+  j) :  stack- 0  S^c,  +  { (head  ::  s>  ,  M(Spc,  +  j)  (head  ::  s>  t  and 
(t.  Context^  +  1))  — »  (//,  x) ,  for  some  t. 

We  also  have  {M(Spc,  +  j)  + 1 )  M^-),  M(Spcd  >head  r)}cCby  the 

induced  constraints.  Therefore  {t  +  }  (  M( T  ,  r)}cC  and  then 

(M(Tpc,  r),  Conte xt(c))  -»  (//,  x)  . 

We  apply  the  preservation  of  return  types  lemma  (Section  6.7. 4. 2)  at  /  =  k ,  obtaining 
3iv.  Context!/!)  =  w  ©  Context (c  +  1)  a  (M( Rpc),  w)  >  (M( R{methodImph  0)),  e)  . 

Now  pc  :  stack-0  — >  Spc  (head  ::  s)  .  The  constraint  induced  by  the  return  instruction  is 
»/Vf(S/)C)  >head  M(Rpc) }  c  C,  i.e.  M(Spc)  (head  ::  s>  M(Rpc) .  We  just  obtained 
(M(Rpc),  w)  >  (M(R{methodImph  0)),  s) .  All  that  remains  to  be  shown  is 

(MR (n.ethodlmpl,  0))’  ConteXt(c  +  1))  >  (u,  x)  . 
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Consider  the  case  in  which  the  method  was  invoked  by  invokes  t at  ic.  Then 
Context(c  +  1)  =  pc'  ::  Context(c) .  The  constraints  {  M metjmdimpl  ^pc'  T pc^m, 

T ’  ,  m  >result  T^c,  r  }  are  induced  by  the  rule  for  invokestatic.  Therefore 
{M(R(me,hodimpi,  0))  v  MiTpC,  r)  >  ^  c  •  Combining  this  with 

(MV,  r),  Conte xt(c))  ->  (//,  x)  gives  (M( R{methodImph  0)),  Context(c  +  1))  (//,  x)  as 

required. 

Consider  the  case  in  which  the  method  was  invoked  by  invokevirtual.  Choose 
methodID  such  that  Instruction(pc')  =  invokevirtual  methodID .  Set  c  =  result, 
i  =  c,v'  =  M(R{methodImph0))  and  v  =  MT/)f,  r) .  The  intial  constraints  contain 

methodlmpl  ^result  methodlmpl ,  0)’  ^  pc' ,  m  ^result  ^  pc',  f  ^  pc',  vO  ^ methodID  ^ pc',  •  Now 

we  appeal  to  the  preservation  of  virtual  call  types  (Lemma  6-19),  applied  to 

(M  V,  r),  Context (c))  (u,  x) ,  to  obtain  (M( R(methodImph  0|),  Context (c  +  1))  (u,  x) , 

as  required. 

6.7.5.18  Induction  Step:  exceptional  returns 

The  rule  for  exceptional  returns  gives 

Ejl,  =  [mode:  THROWING,  pc:  pc,  wstack:  ref  ::  S,  locals:  A,  mstack:  ( pc ",  S  ,S)  ::  f,  p] 

Ep  +  j  =  [mode:  THROWING,  pc:  pc",  wstack:  ref S,  locals:  S ,  mstack:  f,  p] 

Let  c  =  CallerState(£)  and  pc'  =  PC(Ec) .  The  transition  Ec  ^>Ec+]  must  be  an  appli¬ 
cation  of  invokestatic  or  invokevirtual,  because  only  those  rules  extend  f. 
Therefore  Instruction!/*:')  =  invokevirtual  methodID  or 
Instruction(pc')  =  invokestatic  methodlmpl .  In  the  latter  case, 
methodlmpl  =  Code Loc Me t h od ( PC( Ec  +  })) ;  in  the  former  case,  define 
methodlmpl  =  CodeLoeMethod(PC(Sc  +  })) .  In  either  case,  N contains 

{  S^,c'  I>tap  T pC' ^\,  S pc’  I>head  (/x',vl’  ^-pc', tl  ^tail  ^ pc' , 12'  ^ pc'  X\  ^head  ^-pc'  ,v 0’ 

S  pc'  ^tail  ^pd, t2’  ^  pc'  ^head  ^ pc', r  )  ^ 

MethodCall(TpC  m,  Ty,(;  vQ,  Ty,(;  v  | .  Gpc,  ^pc, r) 

Note  also  that  Context^  +  1)  =  err  -pc  ::  Context(c) . 

By  the  lemma  governing  preservation  of  caller  state  (Lemma  6-15), 

Ec  =  [pc:  pc",  wstack:  v'j  ::  v'Q  ::  S,  locals:  S,  mstack:  f,  p]  .  This  implies  pc"  =  pc' . 

Case:  exp  =  stack-w. 

This  case  cannot  occur  because  stack  expressions  do  not  evaluate  to  anything  in  the 
Throwing  state. 

Case:  exp  =  local-w  for  some  n. 

Then  v  =  S(n) ,  and  therefore  (Sc,  pc' :  exp )  v .  From 
(PC(SA.+  1):  local  -n.  Context^  +  1))  — »  (it,  x) ,  and  observing  that 
Context^  +  1)  =  Context(c)  andPC(S^+1)  =  PC(Sc) ,  clearly 
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(PC(  Sc)  :  local-//,  Context(c))  -A  (//,  x)  .  Applying  the  induction  hypothesis  setting 
/  =  c  gives  the  required  result. 

Case:  exp  =  exn . 

Then  v  =  ref ,  and  therefore  (S k,pc  :  exn)  v .  I  will  prove  that 

{pc  :  exn,  Context^))  -A  (//,  x)  ;  the  correctness  of  this  case  then  follows  immediately 

using  the  induction  hypothesis. 

From  (PC(Ek  +  ,) :  exn,  Context!/!  +  1))  -A  (//,  x)  and  the  induced  constraints,  it  follows 
that  PC(5/r  ,  j) :  exn  —>  Xpc,  (s>  where  (M(Xpc,),  Context!/!  +  1))  -A  (//,  x)  . 

I  apply  the  preservation  of  return  types  lemma  (Section  6. 7.4. 2)  at  7  =  k,  obtaining 
3w.  Context^)  =  w  ©  Context (c  +  1)  a  (MX;J,  w)  -a  (M(X{methodImph  0)),  e)  . 

Now  pc  :  exn  — >  Xpc(s)  .  All  that  remains  to  be  shown  is 
(A7(X | meihodl mpl,  0))’  ConteXt(c  +  1))  >  (//,  x)  . 

Consider  the  case  in  which  the  method  was  invoked  by  invokes  t at  ic.  Then 
Context(c  +  1)  =  pc'  ::  Conte xt(c) .  The  constraints  {  M metjmdimpl  Xpc'  T 'pc>m, 

T  ,  m  >„vn  X  .  }  are  induced  by  the  mle  for  invokestatic.  Therefore 
{  A7[X ( methodimpi,  o>)  <Pc'  M(XF')}  c  C.  Combining  this  with 

(M(Xpcf  Context(c))  >  (//,  x)  gives  (M{X(methodImph  0)),  Context(c  +  1))  >  (//,  x)  as 
required. 

Consider  the  case  in  which  the  method  was  invoked  by  invokevirtual.  Choose 
methodID  such  that  Instruction(pc')  =  invokevirtual  methodID .  Set  c  =  exn, 

/  =  c ,  v'  =  M(X(riu,lhodlriipL  0))  and  v  =  M(Xpc,) .  The  intial  constraints  contain 

methodimpi  ^exn  methodimpi ,  0)’  ^ pc',  m  ^exn  ^pe'’  ^  pc',  vO  ^ methodID  ^ pc',  •  Now  I 

appeal  to  the  preservation  of  virtual  call  types,  applied  to  {M(Xpc),  Context(c))  — >  (//,  x) , 
to  obtain  {M{X(methodImpl  0)),  Context (c  +  1))  >  (//,  x) ,  as  required. 

6.7.5.19  Induction  Step:  athrow  rule 
The  rule  for  athrow  gives 

Instruction  (pc)  =  athrow 

S j,  =  [mode:  RUNNING,  pc:  pc,  wstack:  v'  S,  locals:  A,  p] 

S^+  j  =  [mode:  THROWING,  pc:  pc,  wstack:  v'  ::  S,  locals:  A,  p] 

Furthermore  Context!/!)  =  Context^  +  1) ,  and  the  induced  constraint  is 

^head  ^ pc  7  • 

The  function  R  is: 

i?(stack-/77) 
i?(exn) 
i?(local-77) 


is  undefined 

=  stack-0 

=  local-77 
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Case:  exp  =  stack-w  .  Then  R(ex//)  =  stack-»7. 

This  case  cannot  occur  because  stack  expressions  do  not  evaluate  to  anything  in  the 
Throwing  state. 

Case:  exp  =  exn  .  Then  R(ex//)  =  stack-0  . 

The  evaluation  rules  show  v  =  v'  and  therefore  (’E./<,pc  :  stack-0)  v'  =  v. 

Now,  (pc  :  exn,  Context^  +  1))  — »  (n,  x)  implies  that  (X  ,  Context(Z))  — »  (it,  x)  .  But 
since  {M(S^C)  >head  M(Xpc)}  tz  C,  it  follows  that  (pc  :  stack-0,  Context(Z))  — »  (u,  x) . 
The  result  then  follows  directly  from  the  induction  hypothesis. 

Case:  exp  =  local-/?. 

The  proof  for  this  case  is  identical  to  the  proof  for  the  corresponding  case  for  spontaneous 
exception  throws. 

6.7.5.20  Induction  Step:  rule  for  exception  catching 

The  rule  for  exception  catching  gives 

S j,  =  [mode:  THROWING,  pc:  (method,  offset ),  wstack:  ref  V.  8,  locals:  A,  p] 

’Ek+  j  =  [mode:  RUNNING,  pc:  (method,  handler ),  wstack:  ref  V.  8,  locals:  A,  p] 

where  for  some  classID,  handler  =  Catch B  lockOffset! ( method,  offset),  classID) . 

The  the  initial  constraints  contain 

S uccii method,  offset),  (method,  handle}),  S  exn-(titethod,  offsefy-classlD’  ^-‘(method,  offset))  ^ 

!  S  exn -(Method,  offsef)-classID  ^head  method ,  offset)  }• 

The  function  R  is: 

i?(stack-w)  is  undefined  m  >  () 

i?(stack-0)  =  exn 

i?(local-/7)  =  local-// 

We  also  have  Context!/:  +  1)  =  (method,  offset)-(method,  handler)  ::  Context!/) . 

Case:  exp  =  stack-w. 

Since  ( ’E/(  +  l5  (method,  handler)  :  stack-/??)  v ,  v  =  ref  and  m  =  0 .  Then 
R(ex/?)  =  exn  .  The  rules  for  evaluation  give  (S^,  (method,  offset)  :  exn)  v . 

Now,  ((method,  handler)  :  stack-0,  Context(/+  1))  — »  (?/,  x)  implies  that  for  some  t , 

{MS  (method,  handler))  »head  >)^C  and 

(t,  (method,  offset) -(method,  handler)  ::  Context(Z))  — »  (?/,  x) .  We  also  have 

(method,  handler ))  ^ (method,  offset) -(method,  handler)  M(S  eXn -(method,  offset)-classID )  }  —  ^  • 

Therefore,  for  some  f , 

{^  ^(method,  offset)-(method,  handler)  C  M(S  exn_( method,  offset )-c lass! tf  ^head  ^  —  C  -  Indeed, 
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f  =  M(X{method'  offset)) .  Therefore  (M(X.{methodt  offset)\  Context**))  ->  (?/,  x) .  This  implies 
{{method,  offset)  :  exn,  Context!/:))  — »  (?/,  x)  .  The  result  then  follows  from  the  induction 
hypothesis. 

Case:  exp  =  local-//. 

The  proof  of  this  case  is  identical  to  that  for  the  corresponding  case  for  load. 

6.7.5.21  Induction  Step:  get  field  rule 
The  rule  for  getfield  gives 

Instruction  (pc)  =  getf  ieldfieldID 
S k  =  [pc:  pc,  wstack:  ref  V.  S,  heap:  locals:  A,  p] 

Ek+  1  =  [pc:  pc  +  1,  wstack:  HcapObj  Fields! »?( V al ( ref)))  {fieldID)  ::  s,  heap:  'Zf,  locals:  J,  p] 
Also,  the  induced  constraints  are 

{  S pC  I>tail  ^-pc, t’  S pc  ^head  ^pc.obj’  ^pc.obj  ^ fieldID  '^pc.v’  ^  pc  ^head  '^pc.v’ 

S  pc  d tail  Succ(pc,pc+1,  S  pC,  L pC,  GpC,  XpC,  R^c) 

The  function  R  is: 

i?(stack-//?)  =  stack-/??  m>0 

R{  s  tack-O)  =  stack- 0  .fieldID 

i?(local-//)  =  local-// 

Case:  exp  =  stack-/?? ,  m  >  0 .  Then  R(exp)  =  stack -m  . 

The  evaluation  rules  show  HcapObj  Fields!1??!  V al  ( ref))) {fieldID ) ::  S  is  of  the  form 
v0'  ::  v j ' ::  ...  ::  vj  ::  S'  where  vj  =  v.  Therefore 

M  S  tack!  Hi,.)  =  ref::  v , '  ::  ...  ::  vj  ::  S  and  {Ek,pc  :  stack-/??)  vm'  =  v,  as  required. 

In  this  case  I  apply  the  succession  lemma  (6-22)  with  t  =  S  +  j  and 
c  =  tail  ::  ...  ::  tail ::  head  ::  s ,  with  m  >  0  occurrences  of  “tail”.  Also,  s  =  M{ S'  ) . 

pc 

Therefore  M{ S'  )  (c)  — >  f  where  (/',  Context!/:))  — »  (?/,  x)  ;  this  implies 
M{ Tpc  t)  (c')  — >  f  ,  where  c  =  tail ::  c' .  Then  M{Spc)  (c)  — >  t' .  Also 
PC(S^)  :  stack-/??  ~^M{ S  )  (c)  .  All  together  then, 

(PC(S^) :  s  tack-/??,  Context!/:))  — »  (?/,  x)  ;  the  result  follows  immediately  from  the 
induction  hypothesis. 

Case:  exp  =  s tack- 0  .  Then  R(exp)  =  stack-0  .fieldID  . 

The  evaluation  rules  give  v  =  HeapObj Fields!^  V al ( ref)))  {fieldID )  and 

(S k,pc  :  stack-0  . fieldID )  v . 

In  this  case  I  apply  the  succession  lemma  (6-22)  with  t  =  Spc  +  {  and  c  =  head  ::  s  .  Also, 
x  =  M{ S'pc)  .  Therefore  A7(S'/)C) (head  ::  s>  — »  t'  where  {/',  Context!/:))  — »  (?/,  x)  ;  this 
implies  t'  =  M(T  v) .  Furthermore, 
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PC(Sjt)  :  stack- 0  .fieldID  — »  Spc  (head  ::  fieldID  ::  s)  and 
M(Spc)  (head  ::  fieldID  ::  s>  ~^M( Tpc  v) .  All  together  then, 

(PC(E,):s  tack-0  .fieldID,  Context(^))  — »  (?/,  x)  ;  the  result  follows  immediately  from 
the  induction  hypothesis. 

Case:  exp  =  local-??. 

The  proof  of  this  case  is  identical  to  that  for  the  corresponding  case  for  load. 

6.7.5.22  Induction  Step:  put  field  rule 
The  rule  for  put  field  gives 

Instruction  (pc)  =  putfield  fieldID 

=  [pc:  pc,  wstack:  v'  ::  ref  V.  S,  locals:  A,  p] 

S^+  j  =  [pc:  pc  +  1,  wstack:  S,  locals:  A,  p] 

The  induced  constraints  are 

{  Spc  I>tail  ^-pc, t’  SpC  I>head  ^ pc ,v»  ^tail  ^  pc ■>  ^-pc, t  ^head  Tpc.obj’ 

^ pc,  obj  > fieldID  V.v  }  ^  SuCClpC,  PC+  1 ,  S  pc’  Ppc’  ^fC’  ^ '-pc' 

The  function  R  is: 

i?(stack-7?7)  =  stack-(7?7  +  2) 

i?(local-7?)  =  local-77 

Case:  exp  =  stack-777 .  Then  R(cxp)  =  stack- (777  +  2)  . 

The  evaluation  rules  show  S  is  of  the  form  v0'  ::  vf  ::  ...  ::  vj  ::  S'  where  vj  =  v . 
Therefore  MStacklE,.)  =  V  ::  ref::  vf  ::  ...  ::  vj  ::  S'  and 
(E.p,pc  :  stack- (777  +  2))  vj  =  v ,  as  required. 

In  this  case  I  apply  the  succession  lemma  (6-22)  with  t  =  S  +  x  and 
c  =  tail ::  ...  ::  tail ::  head  ::  s ,  with  m  occurrences  of  “tail”.  Also,  s  =  M( S'  ) . 

pc 

Therefore  M( S'  )  (c)  — >  f  where  (/',  Context(^))  — >  (//,  x)  ;  this  implies 
M(S  )  (tail ::  tail ::  c)  — »  f  .  Also  PC(H^)  :  stack-(7?7  +  2)  — »M( Spc)  (tail ::  tail ::  c)  . 
All  together  then,  (PC(S^)  :  stack-(7?7  +  2),  Context(^))  — »  (?/,  x)  ;  the  result  follows 
immediately  from  the  induction  hypothesis. 

Case:  exp  =  local-7?. 

The  proof  of  this  case  is  identical  to  that  for  the  corresponding  case  for  load. 

6.7.5.23  Induction  Step:  getstatic  rule 
The  rule  for  getstatic  gives 

Instruction  (pc)  =  getstatic  staticField 
E^  =  [pc:  pc,  wstack:  S,  globals:  locals:  A,  p] 

S^+  j  =  [pc:  pc  +  1,  wstack:  ^( StaticField )  ::  s,  globals:  locals:  A,  p] 
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The  induced  constraints  are 

{  GpC  ^ staticField  ^ pc ,v>  ^  pc  ^tail  ^pc>  ^  pc  ^head  ^pc,v  )  ^ 

Succ(pc,  pc+ 1 ,  S  pC,  L pC,  Gpc,  Xpc,  RpC). 

The  function  R  is: 

i?(stack-777)  =  stack- (777  —  1)  777  >  0 

R(  stack-0)  =  staticField 

i?(local-7?)  =  local-7? 

Case:  exp  =  stack-7?7 ,  m  >  0 .  Then  R(exp)  =  stack -m  . 

The  proof  for  this  case  is  identical  to  that  for  the  corresponding  case  for  load. 

Case:  exp  =  stack-0.  Then  R(exp)  =  staticField . 

The  evaluation  rules  give  v  =  tf( staticField )  and  therefore  (S p,pc  :  staticField)  v . 

In  this  case  I  apply  the  succession  lemma  (6-22)  with  t  =  S/)c  +  (  and  c  =  head  ::  s  .  Also, 
5  =  M( S'pc) .  Therefore  Ad(S'/)c)  (head  ::  s>  — > 1’  where  (f ,  Context(^))  — »  (?/,  x)  ;  this 
implies  f  =  M(Tpc  v) .  Furthermore,  PC(S^)  :  staticField  — »  G  ( staticField ::  s)  and 
M(Gpc)  (staticField ::  s>  — »M( [T  v) .  All  together  then, 

(PC(SA.)  :  staticField,  Context(^))  — »  (?/,  x)  ;  the  result  follows  immediately  from  the 
induction  hypothesis. 

Case:  exp  =  local-w. 

The  proof  of  this  case  is  identical  to  that  for  the  corresponding  case  for  load. 

6.7.5.24  Induction  Step:  putstatic  rule 
The  rule  for  putstatic  gives 

Instruction  (pc)  =  putstatic  fieldlD 
’Ep  =  [pc:  pc,  wstack:  v'  V.  S,  locals:  A,  p] 

S p+  j  =  [pc:  pc  +  1,  wstack:  S,  locals:  A,  p] 

The  induced  constraints  are 

{  Spc  ^tail  ^  pc  ^ pc  ^head  ^ pc ,v>  ^ pc  ^fiekllD  ^ pc,v  )  ^ 

Succ (pc,  pc+ 1 ,  S  pC,  L pC,  GpC,  Xpc,  RpC). 

The  function  R  is: 

i?(stack-777)  =  stack-(7?7  +  1) 

i?(local-7?)  =  local-w 

Case:  exp  =  stack-777 .  Then  R(exp)  =  stack- (777  +  1)  . 

The  proof  for  this  case  is  identical  to  that  for  the  corresponding  case  for  store. 
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Case:  exp  =  local-w. 

The  proof  of  this  case  is  identical  to  that  for  the  corresponding  case  for  load. 

6.7.5.25  Induction  Step:  iadd  rule 
The  rule  for  iadd  gives 

Instruction  (pc)  =  iadd  classID 
S j,  =  [pc:  pc ,  wstack:  Vj  V2  S,  locals:  A,  p] 

S j,+  j  =  [pc:  pc  +  1,  wstack:  (Val(Vj)  +  Val(v9),  t )  ::  S,  locals:  A,  p] 

The  induced  constraints  are 

{  Spc  ^tail  ^ pc, tb  ^ pc  X\  ^tail  ^pc, t2’  ^  pc  ^tail  ^ pc, t2’  ^  pc  ^head  ^ 'pc,v  ) 

Succ(pc,pc+1,  S  pC,  L pC,  G pC,  Xpc,  RpC) 

The  function  R  is: 

i?(stack-w)  =  stack-(/w  +  1)  m>0 

R( stack-0)  is  undefined 

i?(local-w)  =  local-w 

Case:  exp  =  stack-w; .  Then  R(exp)  =  stack-(w;  +  1) . 

The  evaluation  rules  show  (Va^Vj)  +  Val(v9),  t )  ::  S  is  of  the  form 
v0'  ::  v,' ::  ...  ::  vj  ::  S'  where  vj  =  v.  Therefore 

MStack(S^)  =  vj  ::  v2  ::  v{ vj  ::  S'  and  (Eh pc:  stack-(w  +  1))  vj  =  v,as 
required. 

In  this  case  I  apply  the  succession  lemma  (6-22)  with  t  =  S  +  (  and 
c  =  tail ::  ...  ::  tail ::  head  ::  s ,  with  m  >  0  occurrences  of  “tail”.  Also,  v  =  M( S'  ) . 

pc 

Therefore  M( S'  )  (c)  — »  f  where  (/',  Context!/:))  — »  {it,  x)  .  This  implies  that  c  is  of  the 
form  tail ::  c'  where  M{Tpc  t2)  (c')  — > t' .  This  in  turn  implies  M( S  )  (tail ::  tail  ::  c')  — »  t' . 
Also  PC(EA.)  :  stack- {m  +  1)  Spc)  (tail  ::  c)  .  All  together  then, 

(PC(  E/f)  :  stack- (m  +1),  Context!/:))  — »  (//,  x)  ;  the  result  follows  immediately  from  the 
induction  hypothesis. 

Case:  exp  =  stack-0  . 

Then  Creation  (v)  =  {k  +  1,  {pc  +  l):stack-0)  by  the  definition  of  Creation,  so  the 
induction  result  is  satisfied. 

Case:  exp  =  local-w. 

The  proof  of  this  case  is  identical  to  that  for  the  corresponding  case  for  load. 

6.7.5.26  Induction  Step:  if  cmpeq  rules 
The  rules  for  if  cmpeq  give 
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Instruction  (pc)  =  if_cmpeq  offset 
Ek  =  [pc:  pc,  wstack:  v'  V.  S,  locals:  A,  p] 

S k  +  j  =  [pc:  pc',  wstack:  S,  locals:  A,  p] 

where  either  pc'  =  pc  +  1  or  pc'  =  (CodeLocMethod(pc),  offset)  . 

The  induced  constraints  are 

{  ^pc  ^taii  pc  !  Succ(pc,  pc+ 1 ,  S  pc,  L pC,  Gpc,  Xpc,  R^c)  --J 
Su cc(pc,  (CodeLocMethod(pc),  offset),  S’pc,  Lpc,  Gpc,  Xpc,  Rpc). 

The  function  R  is: 

i?(stack-777)  =  stack-(??7  +  1) 

i?(local-7?)  =  local-7? 

Case:  exp  =  stack-w. 

The  proof  for  this  case  is  identical  to  that  for  the  corresponding  case  for  store.  The 
successor  lemma  is  applicable  regardless  of  which  branch  is  taken. 

Case:  exp  =  local-w . 

The  proof  of  this  case  is  identical  to  that  for  the  corresponding  case  for  load.  The 
successor  lemma  is  applicable  regardless  of  which  branch  is  taken. 

6.7.5.27  Induction  Step:  goto  rule 

The  rules  for  goto  give 

Instruction  (pc)  =  goto  offset 
S ^  =  [pc:  pc,  wstack:  S,  locals:  A,  p] 

Ek+  J  =  [pc:  (CodeLocMethod(pc),  offset),  wstack:  s,  locals:  A,  p] 

The  induced  constraints  are 

Succ(pc,  (CodeLocMethod(pc),  offset),  Spc,  Lpc,  Gpc,  Xpc,  Rpc) 

The  function  R  is: 

i?(stack-W7) 
i?(local-w) 

Case:  exp  =  stack-w;. 

The  evaluation  rules  show  S  is  of  the  form  v0'  ::  ig' ::  ...  ::  vj  ::  S’  where  vj  =  v . 
Therefore  (S k,pc  :  stack-777)  vj  =  v ,  as  required. 

In  this  case  I  apply  the  succession  lemma  (6-22)  with  t  =  S  +  ,  and 
c  =  tail ::  ...  ::  tail ::  head  ::  s ,  with  m  occurrences  of  “tail”.  Also,  5  =  M( S'  ) . 

pc 

Therefore  M( S’  )  (c)  -o-  f  where  (f.  Context^))  -o-  (7/,  x)  .  Also 


=  (stack-777) 
=  local-7? 
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PC(S^)  :  stack -m  —>M(Spc)  ( c )  .  All  together  then, 

(pcm:s  tack -m,  Context(^))  — »  (//,  x)  ;  the  result  follows  immediately  from  the 
induction  hypothesis. 

Case:  exp  =  local-w. 

The  proof  of  this  case  is  identical  to  that  for  the  corresponding  case  for  load. 

6.7.5.28  Induction  Step:  instanceof  rules 
The  rules  for  instanceof  give 

Instruction  (pc)  =  instanceof  fielclID 
=  [pc:  pc,  wstack:  ref  V.  S,  locals:  A,  p] 

S^+  j  =  [pc:  pc  +  1,  wstack:  (v',  t )  ::  S,  locals:  A,  p] 

for  some  value  of  v' . 

The  induced  constraints  are 

{  Spc  ^tail  Tpcfi  ^  pc  ^tail  ^-pc, t’  ^  pc  ^head  ^ pc,\  } 

Succ (pc,  pc+ 1 ,  S  pC,  L pC,  Gpc,  X^,c,  RpC) 

The  function  R  is: 

i?(stack-w)  =  stack-w  m>0 

R( stack-0)  is  undefined 

i?(local-w)  =  local-77 

Case:  exp  =  stack-777  ,  m  >  0 .  Then  R(exp)  =  stack -m  . 

The  proof  for  this  case  is  the  same  as  the  proof  for  the  corresponding  case  for  the 
get  field  rule. 

Case:  exp  =  stack-0  . 

Then  Creation  (v)  =  (k  +  1,  (pc  +  l):stack-0)  by  the  definition  of  Creation,  so  the 
induction  result  is  trivially  satisfied. 

Case:  exp  =  local-7?. 

The  proof  of  this  case  is  identical  to  that  for  the  corresponding  case  for  load. 

6.7.5.29  Induction  Step:  checkcast  rule 

The  proof  for  this  case  is  the  same  as  for  the  goto  rule.  A  successful  checkcast  does 
not  change  the  state  in  any  way. 
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7  SEMI  Implementation 


7.1  Introduction 

Chapter  6  describes  the  SEMI  constraint  system  and  how  it  is  used  to  derive  safe  approxi¬ 
mations  to  the  value-point  relation.  That  chapter  assumes  the  existence  of  an  algorithm  for 
deriving  a  closed  set  of  constraints  from  a  given  initial  set.  In  this  chapter,  I  describe  such 
an  algorithm,  as  implemented  in  Ajax’s  SEMI  analysis  engine. 

First  I  describe  the  basic  algorithm,  and  then  I  present  a  series  of  improvements  to  the 
algorithm  that  improve  its  performance.  I  also  discuss  some  changes  to  the  algorithm  that 
I  tried  and  rejected  because  they  decreased  performance. 

Finally,  I  discuss  some  changes  to  the  constraint  generation  phase  that  simplify  the  initial 
constraint  set  while  leading  to  the  same  results. 

7.1.1  Solver  Specification 

Given  an  initial  constraint  set  Q,  the  job  of  the  solver  is  simply  to  find  a  closed  set  C 
containing  Q. 

Cj  represents  constraints  induced  by  the  program  under  analysis.  C  represents  an  extension 
of  those  constraints  into  a  complete  and  consistent  description  of  the  “types”  in  the 
program. 

Note  that  such  a  C  always  exists.  For  example,  given  Cj,  we  can  add  constraints  making  all 
variables  equal  and  making  all  component  and  instance  relationships  hold  between  all 
variables.  (The  resulting  set  is  finite  because  only  the  variables,  component  labels  and 
instance  labels  that  occur  in  Q  need  be  considered.)  Effectively  this  gives  all  expressions 
the  same  type.  In  practice  this  result  would  not  be  useful  —  it  is  preferable  to  retain  distinc¬ 
tions  between  types  whenever  possible.  However,  this  example  illustrates  that  implemen¬ 
tations  of  the  specification  can  trade  off  accuracy  for  performance. 

7.1.2  Decidability  and  Performance 

Henglein  [42]  shows  that  the  problem  of  finding  a  principal  (i.e.,  most  general)  type  is 
undecidable  in  the  general  setting  of  polymorphic  recursion.  However,  in  practice  all 
examples  seem  tractable.  In  fact,  Henglein’ s  algorithm  is  reported  to  be  quite  efficient  at 
inferring  types  for  functional  programs. 

SEMI  is  similar  to  Henglein’ s  algorithm  and  likewise  has  no  guarantee  of  termination.  (In 
fact,  because  SEMI  can  infer  recursive  types,  the  situation  is  theoretically  even  more  dire 
than  for  Henglein’ s  algorithm:  typable  programs  exist  that  have  no  principal  types.  See 
Appendix  A  for  details.)  However,  nonterminating  cases  have  always  been  traced  back  to 
errors  in  the  solver  implementation.  Because  the  worst  cases  may  not  even  terminate, 
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efficiency  depends  on  the  characteristics  of  “average  case”  programs.  Therefore  we  must 
measure  performance  and  precision  empirically. 

In  fact,  the  problem  of  finding  a  closed  constraint  set  is  not  the  same  problem  as  finding 
principal  types.  As  noted  above,  there  is  no  unique  solution  to  the  problem  of  finding  a 
closed  set,  and  a  trivial  closed  set  can  always  be  found. 1  However,  for  the  sake  of  precision 
we  want  the  analysis  to  distinguish  types  whenever  possible,  just  as  we  do  when  inferring 
principal  types. 

7.1.3  Refined  Specification 

The  SEMI  analysis  engine  extracts  an  approximate  value-point  relation  from  the  closed  set 
C.  This  relation  is  the  only  function  of  C  that  is  used.  Therefore  we  can  relax  the  specifi¬ 
cation  of  the  engine  to  allow  it  to  produce  any  set  C'  that  (for  a  given  set  Q  of  query  expres¬ 
sions)  gives  the  same  relation  as  that  derived  from  a  closed  set  C.  I  will  call  such  a  set  C' 
quasi-closed  with  respect  to  Q.  This  relaxation  enables  many  optimizations. 

The  analysis  engine  actually  computes  a  propagation  graph  from  the  constraint  set  and  not 
a  direct  approximation  to  the  value-point  relation  (see  Section  6.6.1).  However,  as  shown 
in  Section  6.6,  the  results  computed  over  the  graph  are  completely  determined  by  the 
approximate  value-point  relation  defined  for  the  constraint  set.  Therefore  if  C'  induces  the 
same  approximate  relation,  the  results  obtained  from  the  propagation  graph  on  C'  will  be 
be  the  same  as  the  results  for  C’s  graph. 

From  the  definition  of  the  approximate  value-point  relation  in  Section  6.5.1,  the  analysis 
concludes  tq  o-  e2  if  and  only  if 

3 u,  Xj,  X2,  x,',  X2\  (ej,  Xj)  — »  (a,  Xj')  a  (e?,  x9)  — »  (it,  x2 ') 

By  the  instance  transitivity  property  (Lemma  6-7),  this  is  equivalent  to 

3a,  Xj,  x2.  (ej,  Xj)  — »  (a,  s)  a  (e2,  x2)  — »  (a,  s) 

Let  M  be  a  map  from  bytecode  expressions  to  constraint  variables,  defined  as  M(e)  =  a 
where  3c,  a' .  e  — »  a'  (c)  a  a’  (c)  -a  a  .  M(c)  is  defined  for  all  expressions  in  the  query  set 
Q;  this  is  guaranteed  by  the  precautions  in  Section  6.4.5.  Then  the  analysis  concludes 
el  <->e2  if  and  only  if 

3a,  Xj,  x2.  (M(Cj),  Xj)  -a  (a,  s)  a  (M(c2),  x2)  -a  (a,  s) 

From  these  definitions,  it  follows  that  C  is  quasi-closed  if  there  exists  a  C  such  that 

•  C  is  closed 

•  C  contains  Q 

•  V(  v  £  Variables(CI).  3a,  xl5  x2.  (t,  Xj)  -a  (a,  s)  a  (v,  x2)  -a  (a,  s)  in  C  if  and  only  if 
3a,  Xj,  x2.  (t,  Xj)  -a  (a,  s)  a  (v,  x2)  -a  (a,  s)  in  C. 


1 .  For  this  reason,  we  could  guarantee  termination  by  timing  out  and  falling  back  to  an  algorithm  that  is 
guaranteed  to  terminate.  SEMI  does  not  do  this,  however;  choosing  a  suitable  timeout  interval  and  selecting 
an  algorithm  to  fall  back  on  appear  to  be  rather  complex  problems. 
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7.1.4  Basic  Structure 

This  chapter  describes  a  series  of  algorithms  leading  up  to  the  full  SEMI  algorithm,  each 
more  sophisticated  than  the  last.  All  the  algorithms  commence  with  the  initial  constraint  set 
C[  and  add  constraints  to  the  set  until  it  is  closed  (or  quasi-closed). 

Because  the  addition  of  new  constraints  to  the  set  is  a  fundamental  operation  in  the 
algorithms,  it  is  not  difficult  to  extend  these  algorithms  to  be  incremental.  One  can  add  to 
the  initial  constraint  set  Q  at  any  time  and  then  continue  to  add  derived  constraints  until 
reaching  (quasi-)  closure. 

7.2  Basic  Algorithm 

The  basic  algorithm  presented  in  this  section  corresponds  to  Henglein’s  type  inference 
procedure  [42]. 

The  general  procedure  is  to  start  with  a  set  of  initial  constraints  (the  input)  and  repeatedly 
add  constraints  to  the  set  until  it  reaches  closed  form  (the  output).  This  is  complicated  by 
the  fact  that  the  initial  constraint  set  can  increase  during  processing,  and  the  new  constraints 
can  be  observed  by  tools  as  soon  as  they  are  added  (i.e.,  the  results  are  reported  incremen¬ 
tally). 

Therefore,  in  reality,  the  SEMI  solver  takes  a  set  of  constraints  as  input.  If  the  set  is  already 
in  closed  form,  it  reports  termination,  otherwise  it  adds  some  constraints  to  the  set  and 
reports  the  changes  in  the  output  of  the  analysis.  The  added  constraints  are  chosen  to  move 
the  set  “closer”  to  closure;  that  is,  if  the  constraint  set  output  by  one  step  is  always  used  as 
the  input  to  the  next  step,  the  algorithm  should  terminate  (although  as  discussed  above,  we 
cannot  guarantee  that  it  will  terminate). 

7.2.1  Representation  of  Equality 

Like  every  algorithm  of  this  kind,  the  SEMI  solver  uses  a  representation  of  the  constraint 
set  that  avoids  explicit  equality  constraints.  Whenever  a  constraint  of  the  form  “a  =  b  ”  is 
encountered  or  produced,  it  is  discarded,  and  the  solver  substitutes  b  for  a  (or  a  for  b )  in  all 
other  constraints.  This  can  be  implemented  efficiently  by  treating  each  variable  as  an  equiv¬ 
alence  class  and  employing  the  union-find  algorithm  to  merge  equivalence  classes. 

7.2.2  Functional  Representation  of  Components  and  Instances 

The  component  consistency  rule  guarantees  that  for  a  given  variable  t  and  component  label 
c,  there  is  at  most  one  v  such  that  t>c  v  (after  taking  into  account  equivalencies).  Thus  the 
component  constraints  are  represented  as  a  curried  partial  function  F>  :  V  — »  L  >— »  V. 

Likewise  the  instance  constraints  are  represented  as  F^  :  V  — »  I  >— »  V. 

In  the  implementation,  each  variable  v  has  two  hash  tables  associated  with  it,  one  repre¬ 
senting  F>(v)  and  the  other  F^(v). 

When  a  variable  v  is  substituted  for  u  because  u  and  v  have  been  made  equal,  it's  L  >— »  V 
component  map  is  merged  into  v’sL^V  component  map.  The  tricky  part  of  this  process 
is  that  for  each  /  in  the  intersection  of  their  domains,  the  variable  F>(?/)(7)  is  made  equal  to 


163 


the  variable  F<(v,)(/);  thus,  the  merge  procedure  can  invoke  itself  recursively.  The 
procedure  corresponds  to  term  unification. 

The  algorithm  also  merges  w’s  I  »  V  instance  map  into  v’s  I  »  V  instance  map.  This  is 
similar  to  the  case  of  the  component  maps,  and  can  also  result  in  recursive  merge  calls. 

7.2.3  Component  Propagation 

The  above  normalization  procedures  ensure  that  the  constraint  set  is  always  closed  under 
all  rules  except  for  the  component  and  instance  propagation  rules. 

We  treat  the  remaining  rules  as  production  rules: 

•  Component  propagation 

Upon  detecting  {  t  u,t>cv  }  ci  C  for  some  /,  //,  v,  i  and  c,  add  a  new  variable  w  and 
constraint  u  >c  w  (unless  there  is  already  a  w  such  that  u  >c  w). 

•  Instance  propagation 

Upon  detecting  {  t  u,  t\>cv,  u  D>c  w  jcC  for  some  t,  u,  v,  w,  i  and  c,  add  a  constraint 
v  ^  w  (if  not  already  present). 

These  are  implemented  using  a  worklist.  The  algorithm  maintains  a  list  of  “dirty” 
component  constraints  (e.g.  “t  >c  v”)  that  must  be  checked  by  the  component  propagation 
rule.  All  component  constraints  in  Cj  start  off  in  the  dirty  list.  Whenever  a  new  component 
constraint  is  added  to  Cj,  it  is  added  to  the  dirty  list.  Whenever  a  variable  l  is  substituted  for 
another  variable  w,  all  the  components  of  l  that  do  not  already  appear  in  w  are  made  dirty, 
and  likewise  all  the  components  of  w  that  do  not  already  appear  in  l  are  made  dirty. 
Formally: 

{  t  >c  v  |  {  t  >c  v  }  c:  C  a  (_,3//.  {  w  >c  a  }cC)}u 
{  w  >c  v  |  {  w  >c  v  }  e  C  a  (-0?/.  {  t  >c  a  }  e  C)  } 

Also,  whenever  an  instance  constraint  t  u  is  added,  all  the  components  of  l  and  u  are 
made  dirty. 

During  each  iteration  of  the  solver,  it  pulls  one  dirty  component  constraint  t  >c  v  from  the 
dirty  list.  Then  for  each  u  and  i  such  that  {  t  u  }cC,  the  two  production  rules  are 

checked.  Also,  for  each  u  and  i  such  that  {  u  1 }  e  C,  the  second  production  rule  is 

checked,  swapping  u  with  l  and  v  with  w  so  that  the  actual  rule  checked  is 

•  Upon  detecting  {  u  t,  u  >c  w,  t  [>c  v  }  <z  C  for  some  t,  u,  v,  w,  i  and  c,  add  a  constraint 
w  v  (if  not  already  present). 

Note  that  when  checking  this  rule,  since  u  and  c  are  known,  there  can  be  at  most  one  appli¬ 
cable  w. 

Iteration  continues  until  the  worklist  of  dirty  component  constraints  is  empty.  Upon  termi¬ 
nation,  the  constraint  set  is  closed. 

When  an  equality  constraint  is  processed  by  applying  a  substitution  to  the  entire  constraint 
set,  the  same  substitution  is  applied  to  the  elements  of  the  worklist.  Of  course,  this  is  done 
efficiently  using  a  union-find  data  structure. 
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7.2.4  Saving  Time  By  Recording  Additional  Dirtiness  Information 

For  some  variables  t  there  may  be  many  u  such  that  t  u  or  u  t.  When  a  dirty  component 
t  >c  v  is  being  processed,  it  can  be  slow  to  scan  all  the  instances  u  such  that  t  u  and  all 
the  sources  u  such  that  u  t.  Therefore  for  each  dirty  component  t  >c  v,  we  maintain  a  list 
of  all  the  t  ii  and  u  =<:,  l  that  need  to  be  inspected  in  conjunction  with  the  l  >c  v  constraint. 
For  every  situation  in  which  a  component  constraint  may  become  dirty,  there  is  an 
associated  set  of  instance  and  source  constraints  that  will  need  to  be  inspected. 

When  a  new  component  constraint  t  >c  v  is  added,  all  constraints  of  the  form  t  u  and 
u  t  need  to  be  inspected  in  conjunction  with  t  >c  v. 

When  a  variable  t  is  substituted  for  variable  w,  then  for  each  l  >c  v  such  that  {1>cv}cC 
a  (—■3//.  {  w  >c  ii  }  ci  C),  all  constraints  of  the  form  w  =<:,  u  and  u  =<:,  w  need  to  be  inspected 
in  conjunction  with  t  >c  v.  Likewise,  for  each  w  >c  v  such  that  {  w  >c  v  }  <z  C  a 

{  t  >c  ii  }  ci  C),  all  constraints  of  the  form  t  u  and  u  t  need  to  be  inspected  in 
conjunction  with  w  >c  v. 

Whenever  an  instance  constraint  t  u  is  added,  then  for  each  l  >c  v  in  C,  the  instance 
constraint  t  u  must  be  inspected  in  conjunction  with  t  >c  v.  Also,  for  each  u  >c  v  in  C,  the 
source  constraint  t  u  must  be  inspected  in  conjunction  with  u  >c  v. 

This  additional  bookkeeping  greatly  improves  runtime,  while  adding  some  space  overhead. 

7.2.5  Overview  of  an  Algorithm  Step 

An  iteration  of  the  solver  proceeds  as  follows: 

1.  Remove  a  dirty  component  constraint  t  >c  v  from  the  worklist,  with  its  associated  sets 
of  dirty  source  constraints  S  and  dirty  instance  constraints  I. 

2.  For  each  dirty  source  constraint  u  t  in  S,  we  have  {  n  t ’  ^  ^ C  V  }  e  C.  Each  produc¬ 

tion  rule  has  premises  of  the  form  PcC.  For  each  rule,  and  for  each  instantiation  of  the 
free  variables  of  P  such  that  {  u  U>cv}cP  and  PcC,  SEMI  applies  the  rule  to 
obtain  a  set  of  constraints  that  must  be  included  in  the  new  constraint  set.  Each  new 
constraint  not  already  in  the  set  is  added  and  the  dirty  worklist  is  updated  appropriately. 

3.  For  each  dirty  instance  constraint  t  u  in  I,  we  have  {  t  u,  t  A(;  r  J  cC.  For  each 
production  rule,  and  for  each  instantiation  of  the  free  variables  of  the  rule’s  premises  P 
such  that  {  t  ii,  ^t>cv}cP  and  PcC,  SEMI  applies  the  rule  to  obtain  a  set  of  con¬ 
straints  to  add,  as  above. 

For  each  rule,  it  is  easy  to  determine  the  possible  values  of  P  given  that  {  u  /,  t  A(;  v  J  a 
P  or  {  t  ii,  (l>cv}cP. 

Consider  the  component  propagation  rule.  P  is  of  the  form  {  q  r,  q  When  checking 

dirty  instances,  we  have  {  t  u,  t  D>c  V  }cP.  The  only  possibility  is  P  =  {  t  u,  1\>cv  }, 
so  the  consequence  of  the  rule  is  3 w.  {  u  >c  w  }  c:  C .  When  checking  dirty  sources,  we 
have  {  ii  t,  t  \>c  V  }  ci  P.  The  only  possibility  is  P  =  {  u  1,  t  >c  v  },  but  then  since  P  is 
of  the  form  {  q  r,q>cs),  we  must  have  u  =  t  and  P  =  {  t  1,  t  >c  v  } .  In  this  case  the 
consequence  of  the  rule  (3  w.  {t  >c  w}  e  C)  is  already  satisfied  with  w  =  v,  and  so  this 
case  need  not  be  checked. 
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Consider  the  instance  propagation  rule.  P  is  of  the  form  {  q  r,  q>cs,  r>cz}.  When 
checking  dirty  instances,  we  have  {  t  u,  t  ^ c  V  }  ci  P.  The  only  possibility  is  that  P  = 

{  t  ii,  t  >c  v,  u  \>c  z  }  for  some  z.  Since  a  and  c  are  known,  there  can  only  be  one  possible 
value  for  w  and  it  can  be  found  by  inspecting  C,  i.e.,  P  is  completely  determined.  When 
checking  dirty  sources,  we  have  {  u  t ’  ^  ^ C  V  }  ci  P  and  the  only  possibility  is  that  P  = 

{  ii  t,  u>cs,t  >c  v  }  for  some  5.  Again  u  and  c  are  known,  so  the  value  of  5  is  determined. 

Subsequent  sections  describe  enhancements  to  the  basic  algorithm  which  introduce  new 
rules,  but  in  each  case  it  is  just  as  easy  to  determine  how  the  variables  of  the  rules  are  to  be 
instantiated. 

7.2.6  The  Extended  Occurs  Check 

It  is  easy  to  construct  constraint  sets  for  which  this  algorithm  does  not  terminate. 
Furthermore,  these  sets  do  arise  in  practice. 

For  example,  consider  the  set  {  Tf  >result  Tr,  Tf  Tr  J ,  This  could  arise  from  an  analysis  of 
the  following  program: 

f ( )  {  return  f ;  } 

f ’s  result  is  an  instance  of  f .  (This  is  a  contrived  example.  Real  examples  in  Java  are  more 
complicated,  e.g.,  a  method  M  that  returns  a  reference  to  a  new  object  which  contains  M.) 

Suppose  we  apply  the  above  algorithm  to  this  constraint  set: 

•  Apply  component  propagation  to  {  Tf  >result  Tp  Tf  Tr  } : 

add  T  |  and  constraint  {  Tr  >result  T  |  } 

•  Apply  instance  propagation  to  {  Tf  >result  Tr,  Tf  ^  Tr,  Tr  >result  T,  J: 
add  constraint  {  Tr  ^  Tj  } 

•  Apply  component  propagation  to  {  Tr  >result  Tj,  Tr  Tj  } : 
add  T2  and  constraint  {  T  |  [>resLlit  Ti  } 

•  Apply  instance  propagation  to  {  Tr  >result  Tb  Tr  ^  Tb  Tj  >result  T2  } : 
add  constraint  {  Tj  T2  } 


In  type  inference,  the  type  of  f  would  be  an  infinite  term: 
void  — »  (void  — »  (void  . . . )) 

This  recursive  type  is  not  valid  in  Henglein’s  scheme;  therefore  his  algorithm  detects  this 
situation  and  reports  failure.  He  calls  this  detection  the  “extended  occurs  check”.  (It  is 
analogous  to  the  occurs  check  performed  during  term  unification.)  In  terms  of  the  SEMI 
formalism,  the  extended  occurs  check  fires  whenever,  for  some  sets  of  variables  t{  and  u  f. 

{  ^1  ^il  ^  1 1  •  •  •’  ^ n- 1  ^in  ^ it  ^1  ^ compl  ^2’  •  •  •’  hn  ^ compn  ^ hi  !  —  ^ 

This  means  that  the  extended  occurs  check  is  applicable  whenever  we  have  a  variable  /j 
with  a  transitive  instance  un  which  is  also  transitively  a  component  of  /  ] . 

When  the  extended  occurs  check  fires  in  SEMI,  the  solver  simply  forms  a  recursive  type 
by  adding  the  constraint  tx  =  an ,  and  continues.  In  the  example,  the  extended  occurs  check 
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detects  the  constraints  {  Tf  >result  Tp  Tf  ^  Tr  }  and  adds  the  constraint  Tr  =  Tf,  halting  the 
expansion. 

Note  that  adding  this  equality  forces  variables  to  be  equal  that  do  not  necessarily  need  to  be 
equal  according  to  the  initial  constraints.  This  is  why  SEMI  does  not  compute  a  most 
general  (i.e.,  principal)  solution.  The  demonstration  of  non-existence  of  principal  types  in 
Appendix  A  is  based  on  a  similar  example. 

The  implementation  of  the  SEMI  solver  performs  an  extended  occurs  check  whenever  the 
instance  propagation  rule  adds  a  new  instance  constraint  t  =<:,  u  to  C.  It  sets  ii„_ \  =  I,  un  =  it, 
and  in  =  /,  and  then  searches  the  component  and  instance  graphs  for  a  variable  /)  satisfying 
the  check.  Any  such  variables  found  are  bound  to  u.  The  search  proceeds  by  first  scanning 
the  instance  graph  backwards,  finding  all  candidate  ijs  that  are  transitive  sources  of  l 
(including  t  itself),  and  for  each  candidate,  scanning  its  components  transitively  looking  for 
u. 

This  check  could  easily  be  changed  from  worst  case  0(N“)  time,  where  N  is  the  number  of 
variables,  to  O(N)  time,  simply  by  finding  all  transitive  sources  of  t  first,  storing  them  in  a 
hashtable -based  set,  then  scanning  all  of  f  s  transitive  parents  (variables  that  have  l  as  a 
transitive  component)  and  testing  for  membership  in  the  set.  In  practice,  however,  the 
average  numbers  of  transitive  instances,  sources,  components  or  parents  that  a  variable  has 
are  all  very  large,  and  a  check  that  is  linear  time  in  any  of  these  quantities  is  prohibitively 
expensive  (since  the  extended  occurs  check  is  performed  frequently).  Therefore  SEMI  uses 
a  more  complex  approach,  described  below,  which  builds  on  the  basic  algorithm  above.  It 
turns  out  that  with  the  help  of  those  optimizations,  the  worst  case  0(N  )  version  performs 
significantly  better. 

7.2.7  Nondeterminism 

The  algorithm  presented  here  is  nondeterministic,  as  are  all  the  following  elaborations  and 
the  implementation  itself.  There  is  always  flexibility  in  choosing  the  order  in  which  to 
remove  constraints  from  the  worklist.  Different  orderings  can  lead  to  different  results  of  the 
algorithm,  because  the  extended  occurs  check  may  fire  at  different  times  and  induce 
different  equality  constraints. 

The  implementation  also  produces  non-deterministic  results  because  it  is  written  in  Java, 
and  Java’s  semantics  does  not  fully  define  the  behavior  of  the  implementation.  In 
particular,  the  “identity  hash  code”  of  an  object  is  not  defined  by  the  Java  language  speci¬ 
fication.  The  identity  hash  code  is  returned  by  the  default  implementation  of 
Ob  j  ect .  hashCode  ( )  ;  the  only  requirement  is  that  it  always  return  the  same  value  for 
any  given  object.  When  the  same  program  is  run  multiple  times  on  the  same  Java  virtual 
machine  implementation,  the  identity  hash  codes  assigned  to  its  objects  are  often  observed 
to  vary  between  runs.  This  leads  to  observable  variations  in  behavior,  because  the  enumer¬ 
ation  order  of  the  elements  of  hash  tables  and  related  data  structures  depends  on  the  values 
of  the  identity  hash  codes. 

In  practice,  Ajax  almost  always  returns  the  same  results  for  multiple  runs  of  a  given  query. 
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7.3  Optimizing  the  Occurs  Check:  Clusters 

The  naive  approach  to  performing  the  extended  occurs  check  can  be  sped  up  by  exploiting 
the  structure  of  constraints  induced  by  a  Java  program  (or  any  program  that  has  layers  in  its 
architecture,  i.e.,  almost  all  programs). 

7.3.1  Constraint  Structure 

SEMI  generates  instance  constraints  from  a  Java  program  in  the  following  situations: 

•  A  method  body  Mj  makes  a  “static”  call  to  another  method  M2  (Mj  depends  on  M2). 

•  A  method  body  Mj  creates  a  new  object  of  a  class  C  (M|  depends  on  C). 

•  A  method  body  M  l  is  installed  in  the  dynamic  dispatch  table  of  a  class  C  (C  depends  on 
Mj). 

Due  to  the  layered  structure  of  most  programs,  the  graph  of  dependencies  is  “mostly” 
acyclic.  (However,  the  JDK  class  library  itself  contains  a  number  of  surprisingly  complex 
cycles,  so  it  is  important  to  be  able  to  handle  cycles  well.) 

7.3.2  Clusters 

Normally  (i.e.,  in  the  absence  of  a  cycle  of  mutually  recursive  dependencies),  the  variables 
associated  with  parameters,  local  variables,  results,  and  intermediate  values  within  a  given 
method,  and  variables  which  are  components  of  those  variables,  are  related  only  by 
component  constraints.  Instance  constraints  (and  only  instance  constraints)  relate  these 
variables  to  variables  associated  with  other  methods.  Similarly,  in  a  class  there  are  variables 
associated  with  the  method  slots,  and  a  variable  for  the  prototype  object  of  the  class,  which 
are  related  to  each  other  by  component  constraints  only.  Instance  constraints  relate  these 
variables  to  variables  in  the  methods  that  create  objects  of  the  class,  and  to  variables  in  the 
method  bodies  used  by  the  class. 

The  SEMI  solver  explicitly  captures  this  structure.  The  variables  are  partitioned  into 
abstract  clusters ;  the  partition  is  written  R  :  V  — »  X  (where  X  is  the  set  of  cluster  labels). 
The  only  required  property  of  R  is  that  if  t  >c  u  is  a  constraint,  then  R(/)  =  R(«).  In  other 
words,  all  variables  related  by  only  component  constraints  are  in  the  same  cluster. 
Typically,  Java  programs  give  rise  to  a  large  number  of  small  clusters  (one  cluster  per 
method). 

It  is  not  strictly  necessary  to  have  R  be  the  most  refined  partition  possible,  but  that  is  easy 
to  implement  and  gives  the  best  results.  That  is,  if  t  and  u  are  not  related  by  any  chain  of 
component  constraints,  ignoring  direction,  then  R (t)  ^  R(«). 

The  implementation  maintains  the  cluster  map  dynamically,  taking  account  of  variable 
merging  and  the  introduction  of  new  constraints. 

7.3.3  Optimizing  the  Extended  Occurs  Check  Using  Clusters 

The  cluster  map  is  used  to  short-circuit  the  subroutine  that  computes  “Is  u  a  transitive 
component  of  t.{!”  If  R(w)  ^  R(/|),  then  the  result  must  be  false.  Since  clusters  are  generally 
small  and  numerous,  and  following  an  instance  constraint  usually  leads  to  another 
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(different)  cluster,  R(«)  ^  R(/|)  almost  always  holds  during  the  extended  occurs  check 
search. 

7.3.4  Cluster  Levels 

Unfortunately,  even  scanning  all  transitive  sources  of  a  variable  and  performing  a  constant¬ 
time  check  for  each  is  too  expensive,  given  the  frequency  with  which  extended  occurs 
checks  are  performed. 

SEMI  resolves  this  problem  by  explicitly  capturing  the  “mostly  acyclic”  structure  of  the 
inter-cluster  instance  graph.  The  instance  constraints  are  projected  onto  the  clusters;  i.e., 
the  clusters  are  assembled  into  a  directed  graph  G  such  that  for  each  t  V  //,  (R (t),  R(//))  is 
an  edge  in  G.  Then  the  graph  is  partitioned  into  strongly  connected  components,  called 
cluster  levels.  This  partition  is  written  S  :  X  — »  Z,  where  Z  is  the  set  of  cluster  level  labels. 
By  definition,  G  projected  onto  cluster  levels  is  acyclic  (excluding  self-loops).  The  fact  that 
G  itself  is  “mostly  acyclic”  means  that  most  cluster  levels  contain  just  one  cluster. 

The  implementation  maintains  the  cluster  levels  dynamically,  as  the  underlying  constraint 
system  changes.  SEMI  does  this  efficiently,  but  the  implementation  is  tricky  because 
detecting  cycles  can  be  expensive.  It  is  helpful  to  delay  cycle  detection  until  the  cluster 
levels  are  required  to  be  in  a  consistent  (acyclic)  state  (i.e.,  until  the  next  extended  occurs 
check).  SEMI  maintains  a  “dirty”  bit  for  each  cluster  level,  indicating  that  it  may  be  part  of 
a  cycle  of  cluster  levels  because  of  the  addition  of  new  instance  constraints  incident  to  the 
cluster  level.  When  acyclicity  is  required,  the  algorithm  performs  a  worst-case  linear  time 
traversal  of  the  cluster  level  graph  —  a  depth-first  search  backwards  along  the  instance 
edges,  starting  from  the  dirty  cluster  levels.  Any  cycles  found  are  recorded.  Finally,  the 
cluster  levels  in  each  cycle  are  merged.  It  requires  care  to  make  sure  that  all  cycles  are 
detected,  since  the  straightforward  depth-first  search  algorithm  for  cycle  detection  is  only 
guaranteed  to  find  one  cycle  (assuming  a  cycle  exists). 

In  SEMI,  the  cost  of  maintaining  the  cluster  levels  is  usually  negligible  and  never  the 
performance  bottleneck. 

7.3.5  Optimizing  the  Extended  Occurs  Check  Using  Cluster  Levels 

The  cluster  level  map  is  used  to  optimize  the  subroutine  that  scans  the  source  graph  for  all 
candidate  t\S  that  are  transitive  sources  of  1. 

The  extended  occurs  check  subroutine  receives  t  and  u  where  u  is  an  instance  of  t.  Therefore 
every  candidate  has  u  as  a  transitive  instance.  Now  suppose  for  some  candidate  th 
S(R(//))  -£■  S(R(/|)).  There  must  be  a  path  from  S(R(/  j ))  to  S(R(//))  in  the  instance  graph 
projected  onto  the  cluster  levels,  because  there  is  a  path  from  t\  to  u  in  the  instance  graph. 
Because  the  cluster  level  instance  graph  is  acyclic,  there  cannot  be  a  path  from  S(R(//))  to 
S(R(^1)).  Therefore,  for  all  transitive  sources  .s  of  1 \ ,  S(R(.s))  ^  S(R(//))  and  therefore 
R  (s)  R(«),  because  otherwise  we  would  have  an  instance  path  from  S(R(.s))  =  S(R(//))  to 

S(R(*i)). 

Therefore,  whenever  the  extended  occurs  check  subroutine  detects  S(R(//))  S(R(ti)),  Vs 

sources  need  not  be  searched.  In  practice  this  prunes  the  search  tremendously.  In  particular, 
if  S(R(//))  ^  S(R(/))  then  neither  l  nor  its  sources  need  be  checked;  the  entire  check  takes 
constant  time. 
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In  the  special  case  in  which  there  are  no  recursive  dependencies  in  the  original  program, 
the  instance  graph  projected  onto  clusters  is  acyclic,  i.e.,  S  is  one-to-one.  Then  the  extended 
occurs  check  always  completes  in  constant  time.  In  other  words,  this  optimization  ensures 
that  the  extended  occurs  check  only  incurs  a  cost  (apart  from  the  cost  of  maintaining  the 
clusters  and  cluster  levels)  when  polymorphic  recursion  is  actually  being  used. 

7.3.6  Replacing  the  Extended  Occurs  Check  with  a  Conservative 
Approximation 

In  the  case  S(R(?/))  =  S(R(/)),  instead  of  performing  the  rest  of  the  extended  occurs  check, 
one  could  simply  add  the  equality  constraint  t  =  u  .  The  new  instance  constraint  t  u  is 
reduced  to  a  self-loop  in  the  instance  graph,  which  forestalls  the  nonterminating  behavior 
that  the  extended  occurs  check  is  designed  to  prevent.  This  approach  is  similar  to  the 
Hindley -Milner  algorithm,  which  (interpreted  in  this  context)  prohibits  any  polymorphism 
constraints  within  a  cluster  level.  This  behavior  can  lead  to  smaller  constraint  sets  because 
of  the  “unnecessary”  equalities  that  are  introduced,  which  improves  performance  but  does 
yield  a  noticeable  decrease  in  accuracy  for  some  applications  of  the  analysis. 

7.4  Scheduling  the  Worklist  Using  Cluster  Levels 

It  turns  out  that  the  acyclic  cluster  level  graph  is  useful  for  tasks  other  than  optimizing  the 
extended  occurs  check. 

7.4.1  The  Scheduling  Problem 

Components  propagate  from  sources  to  instances,  but  not  the  other  way  around.  Therefore 
as  changes  are  made  to  constraints  at  the  “bottom”  of  the  instance  graph,  they  tend  to 
“bubble  up”  to  instances.  It  improves  performance  to  do  as  much  work  as  possible  at  the 
bottom  of  the  instance  graph  before  making  changes  further  up  the  graph,  by  reducing  the 
number  of  times  each  component  is  visited  or  examined. 

7.4.2  Using  Cluster  Levels 

A  cluster  level  /  is  “dirty”  if  there  is  a  component  constraint  in  the  worklist  of  the  form 
t  >c  a,  where  S(R(/))  =  /. 

Whenever  SEMI  chooses  a  component  constraint  from  the  worklist,  it  chooses  a  constraint 
t  \>c  u  where  the  cluster  level  S(R(/))  has  no  dirty  cluster  levels  below  it  in  the  instance 
graph  projected  onto  the  cluster  levels.  Such  a  constraint  is  guaranteed  to  exist  because  the 
cluster  level  instance  graph  is  acyclic. 

Making  this  choice  efficiently  is  tricky,  but  requires  negligible  time  and  space  in  the  SEMI 
implementation.  The  dirty  component  constraints  are  stored  on  the  worklist  indexed  by 
cluster  levels;  the  problem  reduces  to  finding  an  appropriate  cluster  level  to  work  on.  SEMI 
explicitly  records  the  dirtiness  of  each  cluster  level.  It  also  caches  two  facts  in  each  cluster 
level:  whether  it  is  known  that  there  is  at  least  one  dirty  cluster  level  below  it  in  the  cluster 
level  instance  graph,  and  whether  it  is  known  that  there  are  no  dirty  cluster  levels  below  it 
in  the  graph.  In  practice,  this  cache  can  be  updated  and  invalidated  efficiently  in  response 
to  changes  in  dirty  state  and  changes  in  the  underlying  constraint  set. 
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The  system  keeps  a  list  of  dirty  cluster  levels,  separated  into  two  parts:  the  set  of  dirty 
cluster  levels  that  are  known  to  have  no  dirty  cluster  levels  below  them  on  the  projected 
instance  graph  (the  “ready  list”),  and  the  rest  (the  “blocked  list”).  When  a  constraint  is 
selected  from  the  worklist,  if  the  ready  list  is  non-empty  then  a  cluster  level  is  chosen  from 
it  and  one  of  the  cluster  level’s  dirty  constraints  is  selected. 

If  the  ready  list  is  empty,  then  a  cluster  level  /  is  chosen  from  the  blocked  list.  The  algorithm 
performs  a  depth-first  search  of  the  cluster  level  instance  graph,  backwards  from  /,  from 
instances  to  sources.  During  this  search,  each  visited  cluster  level  is  marked  as  either  having 
dirty  cluster  levels  below  it,  or  not.  If  not,  then  the  visited  cluster  level  is  moved  from  the 
blocked  list  to  the  ready  list.  The  acyclicity  of  the  cluster  level  instance  graph  guarantees 
that  after  this  procedure,  at  least  one  dirty  cluster  level  will  be  found  with  no  dirty  cluster 
levels  below  it  (unless  there  are  no  dirty  cluster  levels  left,  in  which  case  the  algorithm 
terminates). 

7.5  Suppressing  Components:  Advertisements 

7.5.1  Useless  Component  Propagation 

Suppose  Fisa  function  in  the  program  for  which  we  infer  a  large  “type”,  TF.  This  means 
that  TF  is  the  root  of  a  large  graph  of  component  constraints.  At  every  use  of  F  (a  direct  call 
or  the  use  of  F  to  fill  a  slot  in  a  method  table),  a  new  instance  i  of  TF  is  created,  and  a 
constraint  TF  t  is  added.  The  component  propagation  rule  will  effectively  copy  the 
transitive  components  of  TF  (i.e.,  the  component  graph  under  TF)  to  the  instance.  Often, 
however,  much  of  this  structure  will  not  be  used.  For  example,  consider  this  Java  code: 

Foo  x  =  bar ( ) ; 
println (x . kitty )  ; 

Given  the  code  for  bar,  the  analysis  may  work  out  some  complex  type  structure  for  its 
return  value,  including  information  about  the  various  methods  and  fields  of  x.  All  this 
information  will  be  propagated  to  the  caller,  but  only  one  field  is  used,  and  therefore  the 
rest  of  the  information  is  irrelevant. 

Furthermore,  suppose  bar  is  implemented  as  a  wrapper: 

Foo  bar ( )  {  return  baz(5);  } 

Such  constructs  are  common,  and  defeat  purely  local  schemes  for  suppressing  useless 
structure. 

7.5.2  Illustration 

Consider  the  constraint  set  Q  shown  in  Figure  7-1.  This  diagram  and  the  diagrams  that 
follow  represent  constraint  sets  as  graphs.  Nodes  correspond  to  variables.  A  constraint  of 
the  form  t  >c  u  is  displayed  as  a  solid  edge  from  f  s  node  to  ?/’s  node  labelled  with  >c.  A 
constraint  of  the  form  t  u  is  displayed  as  a  dotted  edge  from  f  s  node  to  it's  node  labelled 
with 
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T  represents  the  type  of  some  compound  object  with  an  instance  i  and  further  instances  j 
and  k.  Assume  Q  contains  the  initial  constraint  set,  Q.  The  basic  algorithm  extends  Q  to  the 
closed  set  C  shown  in  Figure  7-2. 


The  basic  algorithm  reaches  C  by  copying  T’s  component  tree  to  all  the  instances,  and 
connecting  the  components  with  instance  relationships. 

7.5.3  Quasi-closure  Conditions 

These  new  components  are  all  unnecessary  —  Q  is,  in  fact,  quasi-closed.  To  see  this, 
consider  two  variables  in  Cj,  u  and  v.  We  must  show  that  u  and  v  are  related  in  Q  if  and  only 
if  they  are  related  in  C. 
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The  notation  “//  4  v”  means  that  there  is  a  chain  of  instance  constraints  from  u  to  v. 

There  are  two  cases: 

•  Suppose  ii  and  v  are  not  related  in  C.  Then  —3x.  a  It  follows  that 

—3x.  ii  4q  x  a  v  4q  x,  since  C  is  a  superset  of  Q.  Therefore  u  and  v  are  not  related  in  Q. 

•  Suppose  ii  and  v  are  related  according  to  C.  Then  3x.  u  ^xav^x.  We  show  that 

3 p.  ii  4q  p  a  v  4q  p,  by  induction  on  the  length  of  the  shortest  chain  of  instances  just- 
flying  ii  4c  x. 

Regardless  of  the  length  of  the  chain,  if  x  occurs  in  Q,  then  u  4q  x  a  v  4q  x,  since  the 
chains  of  instances  justifying  u  4c  x  ar|d  v^x  are  also  in  Q.  (In  other  words,  every 
instance  constraint  in  C  that  holds  between  variables  in  Q  is  is  already  in  Q.)  Thus  the 
induction  hypothesis  holds,  setting p  =  x. 

If  the  length  of  the  chain  is  zero,  then  x  =  //,  hence  x  is  in  Q  and  the  hypothesis  holds. 

If  x  is  not  in  Q,  then  it  must  be  a  child  variable  of  one  of  the  new  component  con¬ 
straints.  Each  such  variable  has  a  unique  predecessor  PY  in  C  such  that  PY  4  x.  The 
chains  u  4c  x  ar|d  r  4c  x  must  have  length  at  least  one,  since  x  is  not  in  Q  and  therefore 
does  not  equal  u  or  v.  Therefore  the  last  link  of  each  chain  must  be  PY  4  x-  Therefore, 
ii  4c  PY  a  v  4c  P.v  also  holds.  By  the  induction  hypothesis,  3 \p.  a  4qP  a  v  4q P- 

This  argument  can  be  generalized.  A  general  set  Q  is  quasi-closed  over  Cj  if: 

1.  Equalities  have  been  eliminated  from  Q,  and  it  is  closed  under  the  instance  and  compo¬ 
nent  consistency  rules  (guaranteed  by  my  representation). 

2.  Q  contains  Q. 

3.  Q  is  closed  under  the  instance  propagation  rule. 

4.  For  all  /,  u,  v,  c,  x,  y,  if  t  4q  u  a  u  4q  v  a  {  t  >c  x,  v  >cy  }  <z  Q,  then  there  is  a  w  such 
that  {  u  >c  w  }  <z  Q. 

5.  For  all  /,  u,  c,  v,  if  t  4q  u  a  {  t  >c  v  }  <z  Q  but  {  u  >c  w  }  is  not  in  Q  for  any  w,  then  the 
set  {  x  |  3 \j,  w,  y.  y  4q  u  a  {  x  4/>>,  x  >c  w  }  c  Q  a  ->(3 z.  {  y  >c  z  }  c  Q)  }  =  {  t  }. 

Conditions  1  and  2  are  fundamental.  Conditions  3  and  4  are  required  to  justify  the  “x  in  Q” 
part  of  the  proof;  they  require  Q  to  be  closed  except  possibly  for  some  unexpanded 
instances  of  compound  structures.  Condition  5  is  required  to  justify  the  “x  not  in  Q”  part  of 
the  proof;  it  ensures  that  if  a  component  c  is  not  propagated  to  u,  then  there  is  a  unique 
instance-chain  predecessor  that  has  a  real  component  that  we  can  fall  back  to. 

7.5.4  Advertisements 

The  system  reaches  this  state  by  propagating  components  lazily.  When  the  component 
propagation  rule  fires,  it  actually  propagates  an  advertisement ,  representing  the  possibility 
of  a  component  being  present  in  the  instance.  An  advertisement  is  a  pair:  the  parent 
variable,  v,  and  a  component  label,  c,  written  v  >c.  These  advertisements  are  propagated 
along  the  instance  graph  using  two  rules: 

•  Advertisement  propagation  from  component 

Upon  detecting  {  1 4,  u,  t>cv)  ci  C  for  some  /,  u,  v,  i  and  c,  add  u  >c. 
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•  Advertisement  propagation  from  advertisement 

Upon  detecting  {  l  //,  l  J  ci  C  for  some  /,  //,  i  and  c,  add  u  ^c. 

If  a  variable  t  already  has  a  component  c,  then  it  does  not  need  an  advertisement  for  the 
same  component. 

•  Redundant  advertisement  suppression 

Upon  detecting  {t>cJ  >c  v  }  c:  C  for  some  t,  v  and  c,  delete  t  >c. 

These  rules  replace  the  component  propagation  rule.  They  guarantee  that  quasi-closure 
conditions  1,  2  and  3  hold  upon  termination. 

7.5.5  Example 

Consider  Figure  7-3.  Instead  of  copying  T’s  entire  component  tree,  we  have  added  adver¬ 
tisements  for  T’s  immediate  components. 


7.5.6  Ensuring  Quasi-closure:  Fill-in 

To  satisfy  quasi-closure  condition  4,  the  algorithm  “fills  in”  an  advertisement  that  has  a  real 
component  above  it  in  the  instance  graph: 

•  Advertisement  fill-in 

Upon  detecting  {  l  //,  l  u  \>cw  }  c:  C  for  some  /,  u  and  w,  add  l  >c  v,  where  v  is  a 
fresh  variable. 

For  example,  consider  the  initial  set  shown  in  Figure  7-4. 

SEMI  adds  an  advertisement  between  T  and  U,  as  shown  in  Figure  7-5.  The  fill-in  rule  will 
ensure  that  the  advertisement  is  replaced  with  a  real  component,  as  shown  in  Figure  7-6. 
The  instance  propagation  rule  will  then  ensure  that  the  instance  chain  from  Tc  to  Uc  is 
completed,  as  shown  in  Figure  7-7. 
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Figure  7-7.  After  fill-in 


7.5.7  Ensuring  Quasi-closure:  Detecting  Conflicting  Sources 

To  satisfy  quasi-closure  condition  5,  each  advertisement  is  associated  with  an  adver¬ 
tisement  source ,  5,  that  records  the  variable  the  advertisement  is  derived  from.  The  adver- 
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tisement  is  written  t  [5].  Quasi-closure  condition  5  becomes  the  “unique  source 
condition”: 

If  the  advertisement  u  [s]  exists,  then 

{x\3j,w,y.  {x4jy,y4cii,x>cw  }cCa  (Vz.  “y>cz”  e  C) }  =  {  5  }. 

The  advertisement  rules  are  extended: 

•  Advertisement  propagation  from  component 

Upon  detecting  {  t  u,  t>cv)  ci  C  for  some  /,  11,  v,  i  and  c,  add  11  [/]. 

•  Advertisement  propagation  from  advertisement 

Upon  detecting  {  l  11,  l  [.v]  J  cz  C  for  some  /,  //,  s,  i  and  c,  add  11  [.s] . 

•  Redundant  advertisement  suppression 

Upon  detecting  {t>c  [5],  t>cv  }cC  for  some  t,  v,  v  and  c,  delete  t  [5]. 

•  Advertisement  fill-in 

Upon  detecting  { t  u,  t  [5],  11  >cw  }  c:  C  for  some  /,  //,  .s  and  w,  add  l  >c  v,  where  v 
is  a  fresh  variable. 

When  a  conflict  arises  —  two  advertisements  for  the  same  component  show  different 
sources  —  we  collapse  the  advertisements  and  make  a  real  component. 

•  Conflicting  advertisement  detection 

Upon  detecting  { t  [5],  t  [/']  }  c:  C  for  some  /,  s,  c  and  r,  where  r  ^  s,  create  a  new 
w  and  add  t  >c  w. 

This  rule  tests  for  the  inequality  of  two  variables.  This  can  be  tricky  because  variables  can 
become  equal  during  the  run  of  the  algorithm,  but  in  fact  it  only  means  that  conflicts  may 
be  detected  that  in  the  end  may  not  be  “true”  conflicts.  Since  replacing  an  advertisement 
with  a  real  component  is  always  a  conservative  operation  (possibly  hurting  performance, 
but  never  correctness),  this  is  not  a  problem. 

The  conflicting  advertisement  rule  guarantees  that  upon  termination,  the  unique  source 
condition  is  satisfied. 

7.5.8  Simple  Example 

For  example,  consider  the  Q  in  Figure  7-8. 

The  algorithm  propagates  advertisements  from  U  and  T  to  V,  but  since  U  ^  T,  the  conflict 
detection  rule  fires  and  a  real  component  is  created  for  V.  This  is  necessary  to  make  the 
result  quasi-closed. 

7.5.9  Advertisement  Source  Updates 

The  conflicting  advertisement  detection  rule  alone  is  not  satisfactory,  however.  Consider 
the  example  in  Figure  7-9. 

Suppose  the  algorithm  propagates  an  advertisement  from  T  to  V  and  then  W,  and  then 
propagates  an  advertisement  from  U  to  V.  (This  schedule  might  be  chosen  because  of 
additional  constraints  not  shown.)  Now  at  V  there  are  conflicting  advertisements,  with 
sources  U  and  T.  The  algorithm  creates  a  real  component  at  V.  The  resulting  state  is  shown 
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in  Figure  7-10.  Next  the  algorithm  propagates  an  advertisement  for  that  component  to  W. 
Now  there  are  conflicting  advertisements  at  W,  with  sources  T  and  V,  so  a  new  component 
must  also  be  created  at  W.  This  is  suboptimal  because  W  could  simply  have  an  adver¬ 
tisement  with  source  V. 

To  avert  such  situations,  it  suffices  to  destroy  the  advertisements  that  could  be  affected  by 
a  new  component;  they  will  be  regenerated  with  correct  source  information,  if  possible. 

•  Advertisement  source  update 

Upon  detecting  { t  [5],  y  >c  z  }  <z  C  for  some  t,  s,  y,  z  and  c,  where  s  4c  T  T  4c  *  and 
s^y,  delete  “i  [5]”. 

7.5.10  Implementation 

Advertisement  constraints  are  easily  added  by  treating  them  as  a  degenerate  kind  of 
component.  Propagation  and  fill-in  detection  are  implemented  by  allowing  advertisements 
as  well  as  components  to  be  on  the  dirty  worklist.  Conflicting  advertisement  detection  is 
straightforward  to  implement  and  is  done  eagerly. 
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The  advertisement  source  update  is  difficult  to  implement  efficiently.  The  straightforward 
implementation  can  destroy  and  recreate  many  advertisements  each  time  a  component  is 
added.  SEMI  uses  an  alternative  representation  for  the  source  field  of  an  advertisement.  An 
advertisement  for  c  at  t  records  a  “bottleneck  variable”  v  such  that  every  instance  chain 
from  the  true  source  5  to  t  passes  through  v.  v  may  be  s,  or  it  may  be  some  instance  of  5,  in 
which  case  v  also  has  an  advertisement  for  c  (and  its  own  bottleneck  variable,  etc).  The  true 
source  s  for  t  can  be  found  quickly;  it  is  either  v,  or  it  is  v’s  true  source.  When  v  is  not  s, 
components  may  be  added  along  the  path  from  5  to  v  without  having  to  update  the  infor¬ 
mation  cached  in  the  advertisement  at  t. 

7.6  Globals 

7.6.1  Handling  Program  Global  Variables 

It  is  straightforward  to  encode  a  program’s  global  variables  (“static  fields”  in  Java)  in  the 
constraint  system  presented.  They  can  be  treated  as  a  single  “globals”  object  with  one  field 
for  each  variable,  which  is  passed  into  each  function  as  a  parameter.  However,  this  is  not 
very  efficient  because  globals  information  must  be  copied  into  each  method  type.  It  is  much 
more  efficient,  and  no  less  accurate,  to  have  just  one  variable  representing  the  globals 
object  and  one  copy  of  the  information  for  the  global  variables.  Lemma  6-21  shows  that  this 
is  no  less  accurate.  The  lemma  states  that  the  information  inferred  for  the  globals  object  in 
any  context  is  always  the  same. 

7.6.2  Characterization  of  Constraints  for  Globals 

In  terms  of  the  constraints,  a  constraint  variable  v  in  an  initial  set  Cj  can  be  said  to  be  global 
if,  for  all  closed  sets  C  containing  Cj,  3g.Vy.  v  4c  T  =>  T  4c  S-  This  means  that  there  is  a 
“top  level”  constraint  variable  g  representing  all  instances  of  the  global  data.  Lemma  6-21 
shows  that  the  constraint  variables  corresponding  to  static  fields  in  the  bytecode  have  this 
property. 
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It  is  easy  to  see  that  an  instance  of  a  global  constraint  variable  is  also  global.  Furthermore, 
a  component  of  a  global  constraint  variable  is  also  global,  because  all  instance  chains 
propagate  down  the  component  constraint. 

Suppose  that  global  constraint  variables  t  and  u  are  related  according  to  the  VPR  approxi¬ 
mation  derived  from  a  quasi-closed  constraint  set  (Section  7.1.3).  Then  3x.  /  u  4  c  x. 

Choose  g  such  that  Vy.  t  4c  y  =>y  4c  g.  Then  x  4c  g,  and  therefore  u  4c  g.  This  implies 
that  g  and  u  are  related  according  to  the  VPR.  Thus,  f  s  global  representative  g  behaves 
identically  to  t  in  the  VPR.  We  can  unify  all  global  constraint  variables  with  their  global 
representatives  without  changing  the  derived  VPR. 

7.6.3  Implementation 

SEMI  marks  constraint  variables  corresponding  to  static  Java  variables  as  global  and  gives 
these  constraint  variables  special  treatment: 

•  If  t  >c  v  and  t  is  global  then  v  is  marked  global. 

•  When  a  global  constraint  variable  is  unified  with  another  constraint  variable,  the  result¬ 
ing  variable  is  marked  global. 

•  If  1 4,  ii  and  t  is  global,  then  the  algorithm  sets  l  =  u  and  deletes  the  instance  constraint. 
This  leads  to  u  being  marked  global. 

•  Global  variables  do  not  belong  to  any  cluster  or  cluster  level.  The  cluster  invariant  is 
modified  to  “if  t  >c  v  and  v  is  not  global  then  t  and  v  belong  to  the  same  cluster”.  The 
scheduler  keeps  a  separate  list  of  dirty  constraints  on  global  variables  and  always  pro¬ 
cesses  them  last,  when  no  dirty  clusters  are  available. 

7.6.4  Exceptions 

SEMI  encodes  exceptions  thrown  by  methods  as  auxiliary  result  components  of  method 
types.  In  real  Java  programs,  as  far  as  SEMI  can  tell  any  exception  thrown  by  a  method  may 
propagate  to  the  top  level.  (This  is  because  catch  clauses  that  catch  all  exceptions  always 
rethrow  the  caught  exception,  and  in  the  case  of  selective  catch  clauses  SEMI  cannot  distin¬ 
guish  between  the  exceptions  that  are  caught  and  the  exceptions  that  are  not  caught.)  This 
means  that  variables  corresponding  to  thrown  exceptions  (or  their  components)  satisfy  the 
same  constraint  property  given  above  for  variables  corresponding  to  global  data.  Therefore 
SEMI  uses  the  “globalization”  optimization  for  variables  corresponding  to  thrown  excep¬ 
tions.  This  technique  causes  no  loss  of  precision,  and  in  practice  the  savings  in  space  and 
time  are  significant. 

7.7  A  Failed  Optimization:  Cut-throughs 

7.7.1  Example 

Consider  the  following  program: 
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Foo  f 1 ( )  {  return  new  Foo ( ) ;  } 

Foo  f  2  ( )  {  return  fl();  } 

Foo  f 3 ( )  {  return  f 2 ( )  ;  } 

...  f  3  ( )  ... 

Any  necessary  components  of  the  new  Foo  will  be  propagated  to  the  call  site  for  f  3.  The 
variables  corresponding  to  the  results  of  f  2  and  f  1  will  also  get  copies  of  the  components. 
This  is  unsatisfying  because  handling  these  semantically  meaningless  layers  of  abstraction 
could  exact  a  significant  cost  in  time  and  space  for  the  solver. 

7.7.2  Cut-throughs 

I  attempted  to  resolve  this  problem  by  introducing  a  notion  of  a  “cut-through  instance”:  a 
single  instance  constraint  that  summarizes  a  chain  of  instance  constraints.  In  the  example, 
a  single  cut-through  instance  could  connect  the  result  of  “new  Foo”  with  the  result  of  f  3 . 
This  meant  that  the  components  of  the  object  need  not  be  expanded  in  the  results  of  f  2  and 

fl. 

It  was  very  difficult  to  implement.  A  large  amount  of  bookkeeping  was  required  to  ensure 
consistency,  and  it  was  tricky  to  implement  efficiently.  To  make  the  implementation 
tractable,  I  had  to  carefully  restrict  the  circumstances  in  which  cut-through  edges  could  be 
used.  Unfortunately,  experiments  showed  that  on  real  examples  cut-through  instances  were 
hardly  ever  being  used.  I  do  not  recommend  introducing  this  style  of  optimization,  and 
SEMI  does  not  perform  it. 

7.8  Reducing  the  Number  of  Initial  Constraints 

7.8.1  Dynamic  Method  Call  Resolution 

In  SEMI,  “virtual”  method  calls  are  usually  more  costly  to  treat  than  static  method  calls 
because  the  inferred  type  of  the  method  will  often  be  copied  into  the  types  of  many  objects. 
Therefore  it  is  advantageous  to  apply  a  preprocessing  step  to  reduce  as  many  dynamic 
method  calls  as  possible  to  static  ones.  This  is  implemented  in  SEMI  by  allowing  an  Ajax 
analysis  to  be  specified  as  an  optional  parameter;  SEMI  will  issue  a  query  using  this 
analysis,  and  use  the  results  to  resolve  as  many  dynamic  method  calls  as  possible. 

For  this  strategy  to  be  useful,  the  subordinate  analysis  should  be  significantly  cheaper  than 
SEMI.  My  experiments  use  RTA++  for  this  purpose. 

Ajax  provides  incremental  updates  to  the  results  of  an  analysis.  For  a  dynamic  method  call 
resolution  query,  this  means  that  a  call  site  with  multiple  possible  callees  will  initially  be 
reported  as  “dead”  (callee  set  is  empty),  then  reported  as  “statically  resolvable”  (callee  set 
is  a  singleton),  and  then  reported  as  “unresolvable”  (callee  set  has  two  or  more  elements). 
Because  SEMI  does  not  support  revocation  of  constraints,  if  it  were  to  observe  the  “stati¬ 
cally  resolvable”  state  and  immediately  add  appropriate  constraints  for  static  method 
invocation,  it  would  then  not  be  able  to  revoke  them  if  the  state  changed  in  the  future  to 
indicate  “unresolvable”.  This  would  not  harm  correctness,  but  it  would  reduce  accuracy.  To 
avoid  this  problem,  the  subordinate  analysis  is  run  to  completion  before  SEMI  uses  its 
results. 
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This  technique  also  improves  both  performance  and  accuracy.  Accuracy  improves  because 
the  statically  resolved  method  call  is  treated  polymorphically  rather  than  monomorphically. 

7.8.2  Lazy  Method  Slot  Stuffing 

The  initial  constraints  install  an  instance  of  each  method  implementation’s  signature  into 
the  signature  for  each  class  C  which  uses  that  method  implementation.  The  SEMI  imple¬ 
mentation  delays  installing  such  an  instance  until  it  has  been  determined  that  that  class’s 
method  slot  may  actually  be  used,  i.e.,  an  invokevirtual  instruction  calls  the  appro¬ 
priate  method  on  a  class  that  is  a  superclass  of  (or  equal  to)  C.  Thus,  nonstatic  methods  of 
a  class  which  are  not  actually  called  will  usually  not  contribute  to  C’s  inferred  type  infor¬ 
mation;  this  vastly  reduces  the  amount  of  work  for  SEMI. 

The  determination  of  which  nonstatic  methods  may  actually  be  called  takes  advantage  of 
the  information  recovered  for  dynamic  method  call  resolution. 

7.8.3  Instance  Suppression 

If  a  polymorphic  value  in  the  program  has  only  one  instance,  one  loses  no  accuracy  by 
treating  it  as  if  it  were  not  polymorphic.  Suppose  the  label  for  the  instance  is  /.  Then  all 
instance  constraints  labelled  i  can  be  replaced  with  equality  constraints.  This  can  greatly 
reduce  the  number  of  variables  and  constraints  in  the  system.  This  optimization  is  used  in 
the  following  situations: 

•  Instructions  with  only  one  predecessor  in  the  control-flow  graph  for  their  method  need 
not  be  treated  polymorphically.  This  provides  a  vast  saving. 

•  Methods  called  from  only  one  call  site,  where  the  callee  is  statically  known,  need  not  be 
treated  polymorphically.  The  information  required  to  implement  this  is  gathered  in 
much  the  same  way  as  for  dynamic  method  call  resolution,  discussed  above. 

•  Classes  created  at  only  one  site  need  not  be  treated  polymorphically. 

7.8.4  Disabling  Intra-method  Polymorphism 

As  mentioned  in  Section  6.3.8,  control  transfers  within  a  method  are  modelled  as  function 
calls,  and  instructions  at  control  flow  merge  points  can  be  treated  as  polymorphic  functions 
with  multiple  callers  (one  caller  for  each  incoming  control  flow  path).  In  practice,  however, 
allowing  such  instructions  to  be  treated  polymorphically  provides  little  or  no  accuracy 
benefit,  and  imposes  a  significant  burden  on  performance.  Therefore  I  have  turned  this 
option  off  for  all  my  experiments;  all  control  transfers  are  treated  non-polymorphically. 

7.8.5  Structural  Shortcuts 

In  the  formal  presentation,  I  have  sets  of  variables  for  the  stack  (S),  local  variable  file  (L), 
and  global  variable  table  (G).  The  former  two  sets  of  variables  can  be  (and  are)  eliminated, 
along  with  the  component  constraints  binding  them  to  particular  stack  and  local  variable 
elements,  by  “pre-solving”  those  constraints.  In  the  implementation  this  amounts  to  a  form 
of  def-use  analysis,  and  greatly  reduces  the  number  of  constraints  generated.  (However, 
since  these  constraints  are  always  local  to  a  method,  the  overall  performance  impact  may 
be  limited.)  This  optimization  is  performed  even  when  intra-method  polymorphism  is 
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enabled;  in  that  case,  the  constraint  generator  “manually”  adds  the  correct  instance 
constraints  that  would  have  been  propagated  from  the  constraints  on  the  Ss  and  Ls. 

The  globalization  optimization  described  above  in  Section  7.6  facilitates  the  removal  of 
explicit  variables  and  constraints  for  the  global  variable  table.  Variables  for  individual 
globals  are  resolved  directly  to  their  top-level  variables,  and  no  constraints  involving  the 
Gs  need  be  recorded. 

7.9  Reducing  the  Number  of  Inferred  Constraints 

7.9.1  Component  Partitioning 

Consider  a  Java  class  C  with  a  number  of  (possibly  inherited)  fields  or  methods,  and  a 
constraint  variable  v,  which  in  some  traces  corresponds  to  objects  of  class  C.  The  variable 
v  may  have  a  number  of  component  constraints,  as  illustrated  in  Figure  7-11.  Each 
component  constraint  generates  an  advertisement  at  each  instance. 


Figure  7-11.  Advertisement  proliferation 


Suppose  we  partition  the  fields  of  C.  We  then  replace  a  direct  component  constraint  for  a 
field  with  a  pair  of  constraints,  one  identifying  the  partition,  and  one  identifying  the  actual 
field  within  the  partition.  Continuing  the  above  example,  suppose  that  there  are  two  equal¬ 
sized  partitions.  The  result  is  shown  in  Figure  7-12. 

If  a  single  partitioning  scheme  is  used  consistently  everywhere,  the  results  obtained  will  be 
identical  to  those  obtained  by  the  simple  constraint  system.  As  this  example  shows,  the 
partitioned  component  constraints  may  require  fewer  advertisements  to  be  generated, 
although  more  component  constraints  are  required. 

A  simple  and  natural  partitioning  scheme  is  to  have  one  partition  for  each  Java  class  and 
assign  the  component  constraint  for  a  field  or  method  to  the  class  in  which  that  field  or 
method  is  declared.  A  more  elaborate  scheme  would  be  to  form  a  hierarchy  of  partitions 
corresponding  to  the  class  hierarchy  of  the  program. 

Section  9.5.4  compares  performance  results  for  the  different  schemes.  The  simple  parti¬ 
tioning  scheme  is  superior  to  the  elaborate  scheme,  and  is  also  superior  to  no  partitioning. 


182 


7.10  Suppressing  Components:  Modality 

7.10.1  Example 

Consider  the  following  Java  code: 

Foo  x  =  b  ?  new  Bar ( )  :  new  Baz ()  ; 

println (x . kitty )  ; 

The  advertisement  algorithm  does  not  perform  well  on  this  code.  Consider  Figure  7-13. 
Suppose  Tx  is  the  constraint  variable  associated  with  x.  For  each  dynamically  dispatched 
method  m  defined  in  both  classes  Bar  and  Baz,  Tx  will  get  two  advertisements  for 
component  m,  one  from  Bar  and  one  from  Baz.  If  the  method  implementations  are 
different,  then  the  advertisements  will  have  conflicting  sources,  so  the  structure  of  the 
method’s  inferred  type  will  be  expanded  (forming  the  unification  of  the  types  of  Bar’s  m 
and  Baz’sffl).  This  can  result  in  a  large  number  of  unnecessary  constraints. 

7.10.2  Approach 

SEMI  annotates  component  constraints  with  mode  information  indicating  how  that 
component  is  used.  A  component  constraint  is  written  t  >c"  u,  I  >cc  u,  I  >cd  u,  or  I  >ccd  u. 
The  superscript  “c”  means  that  the  component  is  used  in  “constructor”  mode.  The  super¬ 
script  “d”  means  that  the  component  is  used  in  “destructor”  mode.  The  superscript 
means  that  the  component  is  not  used  in  any  mode,  “cd”  means  that  the  component  is  used 
in  both  modes. 

The  idea  comes  from  the  realm  of  functional  languages.  In  that  domain,  component 
constraints  are  associated  with  the  use  of  type  constructors,  such  as  the  arrow  type  for 
functions.  The  type  rules  for  these  languages  have  two  forms:  one  form  that  introduces  a 
new  occurrence  of  the  constructor  (“constructor  mode”),  e.g.,  the  “lambda”  rule  for 
creating  a  new  function,  and  another  form  that  eliminates  an  occurrence  of  the  constructor 
and  uses  the  components  (“destructor  mode”),  e.g.,  the  “app”  rule  for  applying  a  function. 
The  intuition  I  rely  on  is  that  if  a  component  is  not  used  in  both  constructor  and  destructor 
modes,  then  no  useful  information  is  transmitted  through  it.  For  example,  if  a  function  type 
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is  introduced  through  the  “lambda”  rule  but  is  never  subject  to  the  “app”  rule,  then  it  does 
not  matter  what  its  components  are.  Similarly,  if  there  is  an  “app”  with  no  corresponding 
“lambda”  then  the  components  do  not  matter.  (In  this  case,  the  code  performing  the  appli¬ 
cation  must  be  dead.) 

When  SEMI  gathers  constraints  from  the  original  Java  bytecode  program,  it  adds  mode 
annotations  to  the  component  constraints  as  follows: 

•  Installing  a  method  implementation  into  a  new  object  type  adds  a  component  constraint 
in  constructor  mode. 

•  Calling  a  virtual  method  in  an  object  type  adds  a  component  constraint  in  destructor 
mode. 

•  Writing  a  field  of  an  object  type  adds  a  component  constraint  in  constructor  mode. 

•  Reading  a  field  of  an  object  type  adds  a  component  constraint  in  destructor  mode. 

•  Calling  a  method  adds  parameter  and  result  component  constraints  to  the  method  type 
in  destructor  mode. 

•  Declaring  a  method  adds  parameter  and  result  component  constraints  to  the  method 
type  in  constructor  mode. 

This  mode  information  changes  the  interface  to  the  solver  and  its  specification.  The 
relevant  change  is  in  the  definition  of  closure.  The  following  parts  of  the  definition  of 
closure  are  altered: 

•  Component  propagation  rule 

Components  propagate  through  instances,  with  nondecreasing  modes: 

{  I  ii,  I  [>/'  v  }  c:  C  =>  3w,  m {  n  o(/"  ir  /  m  c/h'|cC 

The  benefit  of  modes  is  that  we  can  safely  inhibit  some  instance  propagation. 
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•  Instance  propagation  rule 

{  t  4,-  u,  t  >/'  V,  u  >/'  f}cCA(3j,  z.  u  4cT  A  { y  >codz  }  c  C)  =>  {  v  4,-  w  }  c  C 

The  instance  constraint  is  only  propagated  to  the  component  if  there  is  some  transitive 
instance  of  the  component  constraint  that  is  used  in  both  constructor  and  destructor 
mode.  Otherwise  the  instance  constraint  need  not  be  propagated. 

7.10.3  Solver  Rules 

The  solver  rules  given  in  previous  sections  remain  in  force.  Rules  that  match  a  component 
constraint  match  any  mode  annotation.  Rules  that  add  component  constraints  add 
constraints  with  the  “no  mode”  annotation.  We  introduce  a  separate  rule  to  propagate 
annotation  information: 

•  Mode  propagation 

Upon  detecting  {  l  4,  u,  l  [>cm  v,  u  >cni  w  J  ci  C  for  some  /,  //,  v,  /,  c,  w,  m  and  m', 
replace  “w  >/' '  w”  with  “w  >/' 

•  Instance  propagation 

Upon  detecting  {  l  u,  l  l>cm  v,  u  >c  w  }  e  C  for  some  /,  u,  v,  w,  /,  c,  and  m, 
if  3 y,  z.  u  4c y  Ay  >ccdz,  then  add  constraint  v  4,  (if  not  already  present). 

7.10.4  Example 

The  example  above  is  transformed  to  the  following: 


7.10.5  Implementation 

These  rules  are  not  difficult  to  implement,  and  cost  very  little  in  time  and  space.  Mode 
propagation  takes  place  along  with  the  other  work  on  each  dirty  constraint  from  the 
worklist.  The  instance  propagation  check  is  performed  very  efficiently  by  tracking,  for  each 
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t  \>c  V,  whether  there  is  an  instance  of  the  component  with  the  “cd”  annotation;  this 
“instance  mode”  information  is  propagated  from  instances  to  sources. 

7.10.6  Detecting  Unused  Fields 

Suppose  that  F  is  a  field  of  some  class,  and  e  is  a  bytecode  expression,  where  in  some  traces 
e  evaluates  to  real  objects,  but  none  of  those  objects  ever  have  the  field  F.  Because  SEMI 
is  sound,  it  will  determine  that  the  relation  “e  <-»  e”  holds.  This  means  that  SEMI  has  a 
translation  for  e  into  some  constraint  variable  u.  Now  consider  checking  the  relation 
“e.F  <-»  e.F”.  SEMI  will  translate  both  occurrences  of  “e.F”  into  some  constraint  variable  v 
such  that  ii  >p  v.  SEMI  will  therefore  conclude  that  “e.F  <-»  e.F”  holds,  even  though  it  does 
not  hold  in  the  true  relation  (because  the  assumptions  indicate  that  “e.F”  never  evaluates  to 
any  value).  For  some  analyses,  such  as  object  modelling  (see  Chapter  1 1),  it  is  important  to 
be  able  to  detect  that  such  fields  are  actually  unused. 

The  SEMI  solution  is  illustrated  in  Figure  7-15. 


Suppose  that  we  have  two  expressions  e  |  and  e2,  where  e  |  maps  to  constraint  variable  u  and 
e2  maps  to  constraint  variable  v.  The  two  expressions  are  related  because  u  and  v  have  a 
common  instance  t.  However,  instead  of  taking  u  and  v  to  be  the  constraint  variables  for  the 
expressions,  I  insert  the  “Q-d-constraints”  indicated  in  boxes,  and  assign  a'  and  v'  as  the 
constraint  variables  for  the  expressions.  Also,  for  each  constraint  variable  N(;/avv//J  repre¬ 
senting  the  prototypical  object  of  each  class,  I  insert  the  “Q-c-constraints”  indicated  in  the 
box.  Q  is  a  single  predefined  component  and  instance  label. 

Now  if,  in  fact,  e  \  and  e2  can  both  evaluate  to  a  single  real  object,  then  the  soundness  of 
SEMI  guarantees  that  for  some  classID  there  will  be  a  chain  of  instances  leading  from 
N dassiD  t°  the  common  instance  I.  Therefore  I  will  have  a  component  “/  >gcd  w”  for  some 
w,  and  instance  chains  will  be  created  leading  from  a’  to  w  and  from  v’  to  w.  Therefore 
SEMI’s  analysis  of  the  instance  graph  will  deduce  that  e  |  and  e2  are  related. 

On  the  other  hand,  if  e  |  and  e2  do  not  evaluate  to  any  actual  objects,  then  there  may  be  no 
such  classID  such  that  t  is  transitively  an  instance  of  N ciassj]j.  In  that  case  t  will  have  the 
component  “t  >qd  w”,  i.e.,  the  constructor  mode  will  not  be  present.  Therefore  instance 
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chains  will  not  be  created  leading  from  u'  to  w  or  from  v'  to  w,  and  SEMI  will  not  deduce 
that  e  |  and  e2  are  related. 

7.11  Nondeterministic  Virtual  Method  Calls 

A  large  contributor  to  the  size  of  the  constraint  sets  is  the  presence  of  structures  corre¬ 
sponding  to  “method  types”  in  the  signatures  of  objects.  This  is  a  direct  consequence  of  the 
way  SEMI  encodes  virtually-invoked  methods:  as  first-class  functions  carried  in  the  slots 
of  objects.  The  burden  of  having  method  types  in  object  signatures  can  be  eliminated  by 
encoding  each  virtual  method  call  as  a  nondeterministic  call  to  one  of  the  possible  callees 
for  that  call  site.  The  set  of  callees  at  each  call  site  can  be  determined  by  some  simpler 
algorithm  (e.g.,  RTA++). 

This  transformation  effectively  reduces  the  program  to  first-order  code,  and  allows  Ajax  to 
handle  significantly  larger  examples.  Of  course,  the  penalty  is  that  the  analysis  results  may 
be  of  lower  quality  because  higher-order  control  flow  is  not  tracked  as  effectively.  On  the 
other  hand,  accuracy  can  improve  for  some  examples,  because  at  each  virtual  call  site  we 
can  use  a  fresh  polymorphic  instance  of  the  type  of  the  callee.  In  the  standard  mode, 
because  the  callee  is  extracted  from  a  slot  of  an  object  passed  in  as  a  parameter,  its  type 
cannot  be  used  polymorphically .  In  practice  we  find  that  accuracy  does  decrease  somewhat. 
The  effects  are  quantified  in  Chapter  9. 

Ajax  does  not  actually  generate  transformed  representations  of  programs.  SEMI  is 
configured  with  an  arbitrary  “preparatory”  analysis,  and  then  issues  queries  against  the 
perparatory  analysis  to  compute  the  sets  of  possible  callees  at  each  call  site. 

7.12  Future  Work  and  Related  Work 

Each  of  these  optimizations  (except  for  cut-throughs)  made  significant  improvements  to  the 
performance  of  Ajax.  However,  there  are  additional  possibilities  for  optimizing  the  system. 
For  example,  there  seem  to  be  further  opportunities  to  reduce  space  by  implicitly  repre¬ 
senting  some  instance/component  constraints  and  reconstructing  them  on  demand. 
However,  SEMI  already  seems  too  complex,  and  the  generality  of  the  constraint  system 
seems  to  slow  it  down,  especially  compared  to  non-constraint-based  polymorphic  type 
inference  systems  [69]  [54] .  It  remains  unclear  which  strategies  offer  the  best  opportunities 
for  future  performance  improvements. 

Other  researchers  [31]  have  described  how  to  improve  the  accuracy  of  this  kind  of  analysis 
by  labelling  polymorphic  instance  constraints  as  “positive”  and/or  “negative”,  encoding  a 
simple  kind  of  directionality  information.  For  example,  function  results  are  instantiated 
with  “positive”  instance  constraints,  and  function  arguments  are  instantiated  with 
“negative”  instance  constraints.  This  feature  could  easily  be  added  to  SEMI. 

The  SEMI  algorithm  is  superficially  similar  to  other  analysis  engines  based  on 
polymorphic  recursion  [31],  since  they  are  all  based  on  Henglein’s  algorithm.  However, 
SEMI  is  the  only  engine  that  attempts  to  combine  polymorphic  recursion  with  handling  of 
structures  with  multiple  fields.  The  presence  of  types  with  a  high  degree  of  “fan-out”  in 
their  representation  graphs  motivates  many  of  the  improvements  to  SEMI. 
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8  Analyzing  The  Inscrutable 


8.1  Introduction 

This  chapter  discusses  several  features  of  Java  that  pose  fundamental  problems  to  practical, 
sound,  whole-program  static  analysis,  and  presents  Ajax’s  strategies  for  dealing  with  them: 

•  Foreign  and  unknown  code 

•  Reflection  and  serialization 

•  The  Java  String  “constant  pool” 

8.2  Foreign  and  Unknown  Code 

8.2.1  Foreign  Code 

One  goal  of  Ajax  is  to  produce  sound  results:  The  results  of  an  analysis  must  account  for 
all  possible  runtime  behaviors  of  the  program.  I  have  described  methods  for  such  analysis 
of  programs  which  are  completely  described  by  Java  bytecode.  However,  all  real  Java 
programs  depend  on  the  behavior  of  components  that  are  not  described  by  Java  code.  For 
example,  the  standard  Java  class  library  depends  on  “native  code”  libraries  for  some  of  its 
functionality. 

In  many  languages  and  environments  foreign  code  is  essentially  subservient,  providing 
support  to  the  main  system  but  influencing  it  only  in  limited  ways.  For  example,  all  realistic 
languages  provide  input  and  output  routines.  However,  the  effects  of  simple  routines  like 
“print  a  string”  and  “read  a  string”  are  easily  accounted  for:  “print  a  string”  can  be  ignored, 
and  “read  a  string”  can  be  treated  as  code  that  creates  a  String  object  and  fills  it  with  an 
unknown  number  of  unknown  characters. 

In  Java,  interaction  between  foreign  code  and  Java  code  is  much  richer.  Foreign  code  in 
standard  libraries  such  as  the  Abstract  Window  Toolkit  modifies  Java-visible  data 
(including  variables  holding  object  references,  affecting  aliasing),  calls  Java  methods,  and 
creates  new  Java  objects.  If  these  behaviors  are  ignored,  then  some  of  the  program’s  live 
methods  will  appear  to  be  dead,  and  some  of  the  program’s  instantiated  classes  will  appear 
not  to  be  instantiated. 

Foreign  code  also  initializes  the  Java  environment  and  transfers  control  to  the  Java  program 
in  an  appropriate  state.  This  code  can  be  complex  for  programs  packaged  as  “applets”  or 
“servlets”. 
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8.2.2  Unknown  Code 

The  question  of  how  to  handle  “foreign  code”  generalizes  immediately  to  the  question  of 
how  to  handle  “unknown  code,”  which  may  be  foreign  or  may  simply  be  Java  code  that  is 
inaccessible  to  the  analysis.  For  example,  some  tasks  require  that  an  application  be 
analyzed  independently  of  the  implementation  of  the  Java  libraries.  One  such  task  is 
stripping  dead  code  from  an  application  being  packaged  for  execution  on  multiple  different 
Java  virtual  machines,  each  with  its  own  implementation  of  the  standard  libraries  [79]. 

Ajax  requires  access  to  all  Java  bytecode  for  a  program.  The  solutions  that  I  discuss  in  this 
chapter  are  only  applied  to  foreign  code.  However,  the  techniques  and  most  of  the 
discussion  are  certainly  applicable  to  unknown  code  and  modular  analysis  in  general. 

8.2.3  Possible  Approaches 

One  approach  is  simply  to  make  “worst  case”  assumptions  about  foreign  code.  Unfortu¬ 
nately,  foreign  code  is  almost  all-powerful  in  Java.  Most  foreign  code  interacts  with  the 
Java  virtual  machine  through  the  prescribed  “Java  Native  Interface”,  but  that  interface 
allows  the  code  to  do  almost  anything.  Some  foreign  code  bypasses  JNI  and  accesses  Java 
program  state  directly.  Therefore,  if  one  makes  worst  case  assumptions  about  the  behavior 
of  foreign  code,  little  can  be  known  about  the  behavior  of  Java  programs. 

Another  approach  is  to  make  pessimistic  assumptions  about  foreign  code,  tempered  with 
“realistic”  assumptions  limiting  the  code’s  behavior.  For  example,  we  may  assume  that  the 
foreign  code  used  by  the  standard  Java  libraries  has  no  knowledge  of  user  application  code, 
and  will  therefore  not  create  application  objects,  modify  the  state  of  such  objects  or  directly 
call  methods  on  those  objects.  However,  this  assumption  does  not  help  us  analyze  the 
standard  Java  libraries.  It  is  also  possible  for  applications  to  pass  knowledge  —  such  as  the 
names  of  application  classes  and  methods  —  down  into  the  standard  libraries,  that  can  then 
be  used  to  violate  assumptions  about  reasonableness. 

The  latter  approach  is  feasible,  but  very  conservative,  making  it  difficult  to  evaluate  the 
effectiveness  of  the  actual  analysis  engines  and  Ajax  tools.  Therefore  I  have  taken  a  third 
approach:  manual  specification  of  the  behavior  of  all  foreign  code. 

8.3  Salamis:  A  Specification  Language  for  Foreign  Code 

8.3.1  The  Need  For  A  Separate  Specification  Language 

One  way  to  specify  foreign  code  is  to  write  a  Java  bytecode  “dummy  implementation”  of 
each  foreign  subroutine.  My  previous  system,  Lackwit,  took  this  approach  of  writing 
dummy  implementations  in  C.  This  has  the  advantage  of  requiring  little  or  no  work  on  the 
part  of  the  analysis  implementor,  and  providing  a  familiar  language  to  the  specification 
writer. 

Experience  with  Lackwit  revealed  a  serious  problem  with  this  approach:  it  is  difficult  to 
write  dummy  implementations,  because  it  is  unclear  which  implementation  details  are 
relevant  to  the  analysis  and  which  are  not.  This  is  true  even  when  the  specification  writer 
is  the  same  person  who  implemented  the  analysis.  Use  of  multiple  complex  analyses 
exacerbates  the  problem. 
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Therefore  I  created  a  dedicated  specification  language  for  foreign  code,  called  Salamis1. 
Salamis  has  limited  expressivity;  for  example,  there  is  no  arithmetic,  and  conditional 
branches  are  completely  nondeterministic.  The  specification  writer  is  forced  to  abstract 
away  from  details  which  are  irrelevant  to  most  large  scale  analyses. 

To  reduce  the  effort  required  for  parsing  and  analysis,  I  made  the  language  as  simple  as 
possible. 

8.3.2  Example  and  Overview 

Consider  the  Java  code  fragment  in  Figure  8-1. 


FileDescriptor  myFD  =  new  FileDescriptor ( )  ; 

FilelnputStream  stream  =  new  FilelnputStream (myFD)  ; 
stream. open ( ) ; 


Figure  8-1.  Application  code  using  using  native  methods 

Suppose  the  programmer  wishes  to  find  code  that  modifies  her  FileDescriptor  object. 
The  FileDescriptor  is  modified  by  the  native  method  FilelnputStream.  open, 
but  this  knowledge  is  only  available  in  native  code  specifications. 

Figure  8-2  shows  some  code  from  the  standard  library  code  specification  that  defines  the 
behavior  of  the  native  method  open  in  the  class  j  ava  .  io  .  FilelnputStream. 


_stringconst ( )  { 

return  =  j ava . lang . String#internstr  ; 

} 

makelOException ( )  { 

STR  =  _stringconst ( )  ; 

EXN  =  new  j ava . io . IOException; 
j  ava . io . IOException . <init> (EXN)  ; 
j  ava . io . IOException . <init> (EXN,  STR)  ; 
return  =  choose  EXN; 

} 

j ava . io . FilelnputStream. open (THIS ,  NAME)  { 

FD  =  THIS  j ava . io . FilelnputStream. fd; 

NEW_OS_FD  =  choose; 

FD  j ava . io . FileDescriptor . fd  :=  NEW_OS_FD; 
throw  =  makelOException ( )  ; 

} 

Figure  8-2.  Specification  for  j  ava  .  io  .  FilelnputStream.  open 

Each  block  delimited  by  braces  defines  a  Salamis  function.  Each  Salamis  function  either 
defines  a  native  method  with  a  fully  qualified  method  name,  such  as 


1.  “Salamis”  is  the  name  of  the  island  on  which  Ajax  is  said  to  have  been  buried. 
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“j  ava  .  io  .  FilelnputStream.  open”,  or  defines  an  internal  function,  such  as 
“makelOException”,  to  be  used  by  other  specifications. 

Statements  within  blocks  are  delimited  by  semicolons.  Each  statement  evaluates  a  simple 
expression,  with  the  result  optionally  assigned  to  some  local  variable  (using  the  syntax 
“A  =  B”). 

The  expression  “FD  =  THIS  j  ava  .  io  .  FilelnputStream.  fd”  reads  the  contents 
of  the  fd  field  declared  in  j  ava  .  io  .  File  Input  St  ream  from  the  object  referred  to  by 
THIS,  and  stores  the  resulting  reference  in  local  variable  FD.  Note  that  in  Salamis  all  “this” 
parameters  are  explicit.  There  is  no  syntactic  distinction  between  static  and  non-static 
methods.  Note  also  that  all  method  and  field  names  are  fully  qualified  with  the  name  of 
their  class;  this  avoids  the  need  to  have  any  static  type  information  associated  with  Salamis 
local  variables. 

The  statement  “NEW_OS_FD  =  choose ;  ”  creates  an  undetermined  scalar  value  and 
stores  it  in  the  local  variable  NEW_OS_FD.  This  statement  models  the  retrieval  of  some 
unknown  file  descriptor  value  from  the  operating  system. 

The  statement  “FD  j  ava  .  io  .  FileDescriptor  .  f  d  :=  NEW_OS_FD;  ”  stores  the 
value  of  NEW_OS_FD  into  the  f  d  field  of  the  object  referenced  by  FD.  Syntactically,  this 
is  actually  an  “store  expression”  that  is  not  assigned  into  any  local  variable.  Note  that  the 
f  d  field  here  is  different  to  the  field  read  above.  Also  note  that  writing  “FD 
j  ava  .  io  .  FileDescriptor  .  f  d  :  =  choose ;  ”  directly  would  be  syntactically 
invalid,  because  every  statement  has  exactly  one  expression. 

The  constructor  of  FilelnputStream  called  in  Figure  8-1  internally  sets  the  stream’s 
f  d  field  to  myFD.  Static  analysis  then  reveals  that  myFD’s  own  f  d  field  can  be  modified 
by  the  call  to  FilelnputStream.  open.  This  information  is  reported  to  the 
programmer. 

8.3.3  Salamis  Syntax 

The  grammar  of  Salamis  is  presented  in  Figure  8-3.  Apart  from  the  literal  strings  shown  in 
the  grammer,  the  only  tokens  are  Identifiers  and  quoted  Strings. 

The  core  of  the  language  is  the  expressions: 

•  Obj  ect  ere  ation,  e .  g . , 

new  j ava . io . IOException 

The  object  constructor  must  be  called  explicitly  in  a  separate  statement. 

•  Nondeterministic  choice,  e.g., 

choose  EXN 

The  result  of  the  expression  is  chosen  nondeterministically  from  the  comma-separated 
list  of  operands.  In  this  example  there  is  only  one  operand,  so  the  expression  simply 
evaluates  to  the  value  of  EXN.  If  the  list  is  empty,  then  the  result  is  a  fresh,  unknown 
scalar  value. 
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Compilation  Unit.  := Function * 


Function 


::=  Name  (  Identifiers  )  {  Statement  } 


Name 


::=  Identifier 
\  Identifier  .  Name 

I  Identifier  #  Name 


Identifiers 


::=  Identifier 
I  Identifier  ,  Identifiers 


Statement 


Labell  goto  Identifiers  ; 

I  Label!  Definition!  Expression  ; 


Label  ::=  Identifier  : 

Definition  ::=  Identifier  = 


Expression  ::=  new  Name 

I  choose  Identifiers! 

I  Identifier!  Name 
I  Identifier!  Name  :  =  Identifier 
I  Name  {  Identifiers!  )  String! 

I  catch  (  Name!  )  Identifiers 


Figure  8-3.  Salamis  grammar 


•  Object  field  access,  e.g., 

THIS  j ava . io . FilelnputStream . f d 

This  expression  extracts  the  value  of  the  named  field  from  the  object  referred  to  by  the 
operand.  The  first  operand  is  omitted  if  and  only  if  the  field  is  static. 

•  Object  field  assignment,  e.g., 

FD  j ava . io . FileDescriptor . f d  :=  NEW_OS_FD 

The  value  of  the  field  is  set  to  the  second  operand.  The  first  operand  is  omitted  if  and 
only  if  the  field  is  static. 

•  Method  call,  e.g., 

j  ava .io.IOException.<init>( EXN) 

The  named  method  is  called  with  the  provided  parameters.  If  the  method  is  static, 
private,  a  constructor  (method  named  <init>),  or  final,  then  a  static  method 
call  is  used,  otherwise  a  dynamic  method  call  is  used.  The  result  of  the  expression  is  the 
value  returned  by  the  method,  if  any. 

An  optional  quoted  string  is  allowed.  This  string  contains  the  Java  type  signature  of  the 
method  to  call,  in  Java  bytecode  format  (e.g.,  "  (  [  C )  V"  for  a  method  taking  an  array  of 
characters  and  returning  void).  Using  this  signature,  Salamis  can  unambiguously  call 
overloaded  methods.  Note  that  the  JVM  requires  native  methods  to  be  uniquely  named, 
so  there  is  no  need  to  define  overloaded  methods  in  Salamis. 
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•  Salamis  function  call,  e.g., 

_stringconst  ( ) 

This  is  syntactically  the  same  as  a  method  call,  but  no  class  name  is  present  in  the 
method  name.  All  Salamis  function  calls  are  static  (i.e.,  Salamis  functions  are  not  first- 
class). 

•  Exception  catching,  e.g., 

BYTE  =  j ava . io . Ob j ect InputStream. readByte ( THIS )  ; 
catch  ( j ava . lang . Throwable )  BYTE 

This  expression  catches  exceptions  which  are  subclasses  of  Throwable  and  thrown 
by  the  statement  assigning  BYTE.  The  result  of  the  expression  is  any  caught  exception. 
If  not  caught,  exceptions  are  not  propagated  through  Salamis  code;  they  are  simply 
ignored.  Therefore  exceptions  must  be  explicitly  propagated  from  callee  to  caller.  If  no 
class  bound  is  given,  all  exceptions  are  caught. 

•  There  is  one  kind  of  statement  that  is  not  an  expression:  “goto”,  e.g., 

goto  B,  S,  C,  I,  J,  Z,  F,  D,  L 

Control  is  transferred  to  one  of  the  labelled  statements.  Statements  are  labelled  by 
prepending  them  with  the  label  name  and  a  colon. 

8.3.4  Other  Salamis  Features 

The  value  of  the  special  local  variable  “return”  is  returned  by  each  function  or  method.  The 
value  of  the  local  variable  “throw”  is  the  thrown  exception,  if  any.  Salamis  specifications 
do  not  specify  whether  an  exception  is  thrown  or  the  method  (or  function)  returns  normally. 

Every  statement  that  does  not  assign  to  a  local  variable  is  conditional;  it  may  or  may  not 
actually  execute.  Therefore  in  makelOException,  it  is  unspecified  whether  one,  both, 
or  none  of  the  IOException  constructors  (methods  named  <init>)  are  executed. 

Sometimes  it  is  necessary  to  associate  values  with  objects  that  do  not  belong  in  the  fields 
declared  for  the  object  in  Java.  One  example  is  the  lengths  of  arrays.  For  such  cases, 
Salamis  supports  synthetic  “specification  only”  fields  (called  “spec  fields”).  Static  spec 
fields  are  also  supported,  e.g.,  j  ava  .  lang .  String#internstr  above  refers  to  the 
global  spec  variable  “intemstr”.  This  fields  are  not  declared  anywhere;  conceptually,  they 
are  simply  created  as  needed  when  accessed. 

All  updates  to  object  fields  in  Salamis  are  treated  as  conditional;  The  previous  value  of  the 
field  may  persist.  Thus  many  of  the  Salamis  specifications  use  a  single  object  reference  in 
a  spec  field  to  refer  to  a  whole  collection  of  objects.  For  example, 

j  ava  .  lang  .  String#internstr  refers  to  one  of  the  entire  collection  of  interned  string 
objects;  whether  there  is  one  or  many  is  irrelevant  to  any  analysis,  because  the  semantics 
of  Salamis  are  the  same  in  either  case. 

Array  accesses  are  treated  by  identifying  the  elements  of  an  array  object  with  special  spec 
fields  of  the  object,  depending  on  the  type  of  the  array:  #intarrayelement, 
#longarrayelement,  #f loatarrayelement,  #doublearrayelement,  and 
#arrayelement  (for  arrays  of  object  references).  Arrays  ofbytes,  shorts,  and  characters 
have  their  contents  mapped  to  #intarrayelement. 
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Sometimes  it  is  necessary  to  refer  to  the  names  of  array  classes.  These  are  given  the  internal 
Java  Virtual  Machine  names  (e.g.,  [  I  for  an  array  of  integers,  [  L  j  ava  .  lang .  Ob  j  ect  ; 
for  an  array  of  objects). 

8.3.5  Implementation 

Salamis  code  is  compiled  into  Java  data  structures  by  a  simple  front  end.  The  data  structures 
are  then  serialized  into  “specification  resources”  that  are  located  and  loaded  by  Ajax  at 
analysis  time. 

When  an  analysis  encounters  live  foreign  code,  it  looks  up  the  specification  and  then 
analyzes  the  specification  directly.  In  other  words,  all  analyses  have  to  be  able  to  analyze 
Java  code  and  also  Salamis  specifications.  In  practice  this  is  not  too  difficult,  although  it  is 
rather  cumbersome  and  leads  to  some  duplication  of  code. 

This  approach  also  requires  the  language  of  bytecode  expressions  to  be  extended  to  include 
Salamis  variables.  Tools  also  have  to  be  extended  to  scan  Salamis  specifications  as  well  as 
Java  bytecode. 

8.4  Salamis  Specifications 

Appendix  B  presents  the  Salamis  specifications  for  the  portion  of  the  JVM  class  library 
used  by  my  examples. 

8.4.1  Omissions 

The  specifications  cover  only  the  foreign  code  exercised  by  my  test  applications,  which 
includes  the  example  applications  for  my  thesis  plus  some  other  applications.  Also,  they 
specify  the  code  used  by  only  the  Windows  implementation  of  the  Sun  JDK  1.1.  Other  JDK 
versions  and  implementations  on  other  platforms  use  different  Java  libraries,  which  rely  on 
different  foreign  code,  and  may  therefore  need  different  Salamis  specifications.  Even  given 
these  limitations,  there  are  over  2,500  lines  of  specifications  covering  such  complex  areas 
as  the  Java  Abstract  Window  Toolkit,  which  manages  the  interaction  between  Java  and  the 
underlying  Windows  graphical  user  interface  toolkit. 

There  are  a  few  places  where  it  is  impossible  or  undesirable  to  specify  the  foreign  code 
adequately.  The  most  important  such  area  is  the  reflection  services,  which  are  discussed 
below. 

8.4.2  Risks 

The  behavior  of  foreign  code  used  by  the  Java  libraries  is  difficult  to  deduce.  Much  of  it  is 
internal  to  the  library  implementation,  and  much  of  the  rest  is  under-documented.  I  have 
proceeded  by  reverse-engineering  the  Java  library  bytecode,  and  by  observing  the  behavior 
of  the  Java  Virtual  Machine.  This  approach  is  difficult  and  error-prone.  Even  with  access 
to  the  JVM  source  code,  this  task  would  still  be  difficult;  the  JVM  and  its  libraries  are  large 
and  complicated  pieces  of  code. 

It  is  impossible  in  principle  to  rigorously  prove  that  the  specifications  actually  match  the 
behavior  of  the  foreign  code.  In  practice  it  is  also  difficult  to  test  for  conformance.  My 
testing  consisted  of  running  live  code  analyses  using  the  specifications  and  comparing  the 
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results  to  profile  data  gathered  by  running  the  example  programs  in  the  JVM;  profiled 
methods  that  are  declared  dead  by  the  analysis  clearly  indicate  bugs,  either  in  the  specifi¬ 
cations  or  the  analysis  itself.  I  found  many  incomplete  specifications  this  way.  However,  it 
is  difficult  to  achieve  high  confidence  in  the  completeness  of  the  specifications. 

8.4.3  Handling  Strings 

One  quirk  in  the  semantics  of  the  JVM  shows  up  in  the  specification  of  certain  String 
methods.  The  JVM  maintains  a  set  of  String  objects  called  “interned  Strings”:  at 
runtime,  each  possible  string  of  characters  has  at  most  one  corresponding  “interned 
String”  object.  When  a  JVM  instruction  accesses  a  string  constant,  it  returns  a  reference 
to  the  interned  String  for  that  string  of  characters.  Also,  it  is  possible  to  obtain  the 
interned  String  for  an  arbitary  string,  by  calling  the  method  String .  intern  ( )  .  This 
facility  is  provided  to  save  space,  and  to  allow  interned  Strings  to  be  compared  for  string 
equality  merely  be  comparing  the  object  references. 

The  unfortunate  result  in  Ajax  is  that  every  object  reference  that  could  refer  to  a  String 
constant  must  be  related  in  the  VPR  to  every  other  object  reference  that  could  refer  to  a 
String  constant.  I  model  this  behavior  faithfully  in  order  to  satisfy  the  definition  of  the 
VPR.  Furthermore,  some  programs  can  depend  on  it  in  practice,  for  example  when  object 
references  are  compared.  This  is  why  the  Salamis  example  above  gets  String  constants 
from  the  global  #internstr  spec  field.  The  bytecode  instructions  that  fetch  references 
to  String  constants  also  get  the  reference  from  this  field.  In  many  cases  it  would  make 
sense  to  relax  this  behavior  and  support  unsound  handling  of  Strings. 

8.4.4  Other  Areas  Of  Interest 

The  Salamis  code  for  sun.  awt  .windows  .WToolkit .  event  Loop  is  particularly 
interesting.  This  method  runs  indefinitely  on  a  special  AWT  thread,  pulling  events  from  the 
Windows  event  queue  and  processing  them.  It  responds  to  the  native  Windows  events  by 
calling  methods  on  Java  “peer”  objects  associated  with  each  underlying  Windows  interface 
object.  If  the  callbacks  are  not  modelled  correctly,  then  the  peer  object  methods  appear 
never  to  be  invoked,  and  large  chunks  of  a  program’s  code  may  never  be  triggered. 

Much  of  the  Salamis  code  is  devoted  to  ensuring  that  appropriate  exceptions  are  potentially 
thrown  by  each  method.  Also,  there  is  a  special  function  _magicexn,  which  returns  one 
of  the  exceptions  which  may  be  raised  at  any  time  by  the  Java  Virtual  Machine  (e.g., 
VirtualMachineError).  This  function  is  used  by  the  analyses  to  ensure  that  code 
which  can  catch  such  exceptions  is  handled  soundly;  the  result  of  this  function  is  added  to 
the  set  of  objects  which  may  be  caught  by  the  code.  The  _magicexn  function  also 
includes  exceptions  for  run-time  errors  that  can  occur  so  commonly  that  they  might  as  well 
be  thrown  anywhere,  such  as  ArraylndexOutOfBoundsException, 
NullPointerException  and  ClassCastException.  (These  are  the  exceptions 
belonging  to  the  set  ErrorClassIDs  in  the  MJBC  language;  see  Section  3.2.5.)  This  results 
in  no  loss  of  accuracy  with  the  existing  Ajax  analyses,  because  they  do  not  accurately 
capture  which  exceptions  can  be  thrown  by  which  methods. 
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8.5  Reflection  And  Serialization 

8.5.1  Introduction 

An  especially  interesting  application  of  foreign  code  is  the  standard  Java  reflection  library. 
It  allows  programs  to  query  and  manipulate  the  elements  of  a  Java  program  at  run  time.  For 
example,  a  program  can  obtain,  as  a  string,  the  name  of  the  class  of  any  object.  Conversely, 
given  the  name  of  a  class  as  a  string,  it  can  create  an  object  of  the  class.  It  can  obtain  a  list 
of  the  names  of  the  fields  and  methods  of  an  object,  and  other  information  about  those 
members.  It  can  even  call  the  methods  and  modify  the  fields  by  name. 

Reflection  is  extremely  powerful  and  useful,  and  it  is  widely  used  by  real  programs.  Many 
important  Java  programming  paradigms  depend  on  it  (for  example,  Java  Beans).  Unfortu¬ 
nately,  it  is  almost  completely  impervious  to  static  analysis. 

A  specialized  form  of  reflection  is  Java  serialization  —  a  facility  for  storing  and  retrieving 
object  structures  from  a  byte  stream.  Serialization  uses  reflection  to  traverse  the  contents  of 
objects  without  requiring  the  user  to  write  traversal  code  for  each  class. 

8.5.2  The  Reflection  Services 

Reflection  is  not  an  esoteric  feature  used  by  just  a  few  applications.  In  fact,  the  Java 
libraries  themselves  depend  on  it.  For  example,  the  Sun  JDK  library  reads  the  name  of  the 
current  locale  from  a  text  file,  prepends  it  with  the  string 

sun .  io  .  CharToByteConverter,  and  then  loads  the  class  with  that  name  and  creates 
an  object  of  the  class. 

Many  applications,  including  some  of  the  applications  I  chose  for  my  benchmark  suite,  also 
depend  on  reflection  internally.  (The  benchmark  applications  are  described  in  the  next 
chapter,  in  Section  9.2.2.)  For  example,  the  Ladybug  specification  checker  tool  [44]  has  a 
user  interface  shell  wrapped  around  an  abstract  formula  solution  engine.  The  UI  shell 
accesses  the  engine  through  a  Java  interface,  and  has  no  compile-time  dependence  on  any 
particular  implementation  of  the  interface.  At  run  time,  Ladybug  uses  reflection  to  load  the 
engine  class  by  name  and  create  an  object  of  that  class.  The  object  is  downcast  into  a 
reference  to  the  engine  interface,  and  can  then  be  used  by  the  user  interface  shell.  This 
pattern  of  using  reflection  to  break  compile-time  dependencies  is  quite  common. 

Another  interesting  use  of  reflection  is  in  the  Jess  expert  system  shell  [35].  Jess  interprets 
rule  sets,  which  are  essentially  programs.  These  programs  can  contain  directives  to  create 
and  manipulate  Java  objects;  these  directives  are  interpreted  by  Jess  by  simply  passing 
them  down  to  the  Java  reflection  API  (along  with  some  wrapping  and  unwrapping  between 
Java  object  references  and  Jess  data).  By  this  simple  mechanism,  the  full  power  of  the  Java 
platform  is  available  to  Jess  programs.  Clearly,  static  analysis  of  Jess  alone  in  the  presence 
of  these  directives  is  no  longer  possible;  one  would  have  to  analyze  Jess  in  combination 
with  the  Jess  rules  being  interpreted.  When  I  use  Jess  as  one  of  my  example  programs  for 
this  thesis,  I  assume  that  these  particular  directives  are  not  used. 

Of  course,  Java’s  original  source  of  popularity  was  that  it  can  dynamically  load  and  run 
code  from  arbitrary  sources.  This  ability  depends  on  the  use  of  reflection.  It  also  requires 
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the  use  of  ClassLoaders,  but  ClassLoaders  do  not  present  any  real  problems  for  Ajax  above 
and  beyond  the  difficulties  of  reflection. 

Another,  rather  obscure,  use  of  reflection  is  built  into  the  Java  compiler.  The  Java  language 
construct  ClassName  .class  obtains  the  metaclass  Class  object  for  the  class  named 
“ClassName”.  The  Sun  Java  compiler  implements  this  feature  by  compiling  in  a  call  to 
Class  .  f  orName  ( "  ClassName"  )  ,  along  with  some  caching  of  the  return  value  to 
speed  up  cases  where  the  expression  is  evaluated  frequently. 

8.5.3  Reflection  Specifications 

Ajax  allows  the  programmer  to  manually  provide  specifications  describing  how  a  program 
uses  reflection,  e.g.,  which  classes  it  can  create  instances  of  and  which  methods  it  can  call 
using  the  reflection  API.  Appendix  C  gives  the  actual  specifications  used  in  the  experi¬ 
ments. 

Reflection  specifications  describe  a  set  of  reflective  methods ,  the  methods  that  perform 
reflection  operations.  For  each  reflective  method,  the  specifications  list  the  caller  methods, 
and  for  each  caller,  the  specifications  enumerate  the  classes,  methods  or  fields  it  may  access 
through  the  callee  reflection  method.  For  example,  consider  Figure  8-4. 


j  ava . lang . reflect . Constructor . newlnstance  [ 

j  ava fig . gui . ModularEditor . handleCommandCallback  { 
class= j  ava fig . commands . * 

} 

ajax. tools. benchmarks . GeneralBenchmark . ma ke Pr intS inks tr earn  { 
class= j  ava . io . Prints t ream 

} 

] 

Figure  8-4.  Sample  reflection  specification 

Figure  8-4  specifies  that  Constructor .  newlnstance  is  reflective.  (This  method  creates 
a  new  object  using  a  constructor  chosen  at  run  time.)  The  specification  states  that  there  are 
only  two  callers  of  this  reflective  method.  The  first  caller,  handleCommandCallback, 
only  uses  the  method  to  create  objects  of  classes  whose  fully  qualified  names  start  with 
“j  avaf  ig .  commands  .  ”  The  second  caller  uses  it  only  to  create  objects  of  class 
j  ava  .  io .  PrintStream.  Note  that  once  again  every  class,  method  and  field  name  is  fully 
qualified  with  the  declaring  class  name  and  package. 

This  specification  format  has  two  advantages.  Ajax  can  check  during  analysis  that  every 
caller  to  a  reflective  method  is  actually  listed  in  the  specifications,  and  issue  warnings  when 
unknown  callers  are  found.  This  is  an  essential  aid  to  locating  all  uses  of  reflection  in  a 
program.  Also,  the  usage  of  reflection  can  be  computed  based  on  the  methods  that  Ajax 
finds  to  be  live;  dead  code  that  uses  reflection  does  not  impact  the  analysis.  This  means  that 
one  specification  file  can  describe  the  reflection  behavior  of  the  Java  libraries  and  a  set  of 
user  applications.  The  only  other  analysis  system  with  documented  support  for  reflection 
specifications,  Jax  [79],  only  allows  the  programmer  to  specify  one  list  of  methods  and 
classes  accessed  via  reflection,  and  does  not  allow  the  programmer  to  specify  which 
program  methods  perform  reflective  actions;  thus  it  does  not  have  these  advantages. 
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Another  advantage  of  this  format  is  that  wrappers  around  reflective  methods  can  be  added 
to  the  specifications  as  a  new  reflective  method.  This  allows  its  callers  to  be  easily  located 
and  reported  by  Ajax. 

Ajax  has  a  separate  mechanism  to  handle  the  compiler  generated  use  of  Class  .  f  orName 
discussed  above.  During  analysis,  it  detects  when  Class  .  f  orName  is  called  with  a 
constant  string  parameter,  and  adds  the  named  class  to  the  list  of  classes  which  are 
reflected.  Therefore  uses  of  the  ClassName  .class  expression  do  not  need  to  be  listed 
in  the  reflection  specifications. 

8.5.4  Reflection  Specification  Syntax 

The  syntax  is  very  simple.  The  example  above  demonstrates  almost  all  the  syntactic 
features  of  the  language.  A  reflective  method  can  have  an  arbitrary  number  of  callees,  and 
each  callee  can  specify  an  arbitrary  number  of  “reflection  targets”.  A  reflective  method  and 
its  callees  are  specified  as  fully  qualified  method  names;  if  disambiguation  of  overloaded 
methods  is  required,  the  method  name  can  be  extended  with  a  list  of  parameter  types  and 
quoted  as  a  string.  The  grammar  is  given  in  Figure  8-5.  As  for  Salamis,  the  tokens  are  the 
literal  strings  occuring  in  the  grammer,  plus  Identifiers  and  quoted  Strings. 


R  efl  ectionSpec  ::=Refl  ectiveMethod  * 

ReflectiveMethod::=MethodName  {  Caller*  } 

MethodName::=  Name 
\  String 

Name  ::=  Identifier 

I  Identifier  .  Name 

Caller  ::=  Name  {  ReflectionTarget*  } 

ReflectionTarget::=TargetType  =  TargetSpec 

TargetType  ::=  class 
I  field 
I  method 
I  serialized 

TargetSpec  ::=  WildcardName 

I  WildcardName  <  Name 

WildcardNamev.—Name 

I  Name  .  ?  * 

I  *  .  ?  Name 

Figure  8-5.  Reflection  specification  grammar 


199 


Reflection  targets  identify  the  classes,  methods  or  fields  that  may  be  referenced  by  the 
reflective  operation.  There  are  four  kinds  of  reflection  targets: 

•  Classes 

•  Methods 

•  Fields 

•  Serialized  Classes 

None  of  the  examples  I  have  analyzed  use  field  reflection. 

The  “serialized  class”  targets  are  used  to  specify  which  classes  of  objects  may  be  read  from 
storage  using  the  Obj  ect  InputStream  deserialization  machinery.  If  a  class  is  a 
“serialized  class”  target,  then  instances  of  that  class  may  be  returned  from  calls  to 
Obj  ectlnputStream.  readObj  ect.  The  Obj  ect  InputStream  constructor  is 
treated  as  a  reflective  method;  callers  of  the  constructor  specify  which  classes  they  will 
deserialize  using  the  stream.  Strictly  speaking  the  constructor  is  not  a  reflective  method, 
because  objects  are  not  deserialized  and  created  until  readObj  ect  is  called  on  the 
stream.  However  it  is  more  helpful  to  identify  creators  of  object  input  streams  than  readers 
of  objects  from  those  streams. 

The  language  supports  two  shorthand  ways  to  specify  reflection  targets,  corresponding  to 
ways  that  reflection  is  frequently  used  in  practice: 

•  Wildcard  names,  e.g., 

j  avaf ig . commands . * 

This  means  any  class  (or  method)  whose  fully  qualified  name  starts  with 
“j  avaf  ig .  commands  .  ”  Wildcards  need  not  be  in  trailing  positions,  e.g., 

“*  .  Handler”  is  allowed.  Ajax  searches  through  all  the  available  classes,  methods  or 
fields  to  find  the  ones  whose  names  match  the  pattern.  These  patterns  are  very  useful 
because  programs  often  prepend  or  append  some  constant  string  to  a  variable  before 
passing  a  name  to  the  reflection  API. 

•  Interface  constraints,  e.g., 

j  ess . *<j  ess . Test 

This  means  any  class  matching  the  pattern  “jess.  *”  which  implements  the  named 
interface  jess. Test.  This  is  also  very  useful  because  programs  creating  objects  via 
reflection  usually  require  those  objects  to  satisfy  some  known  interface. 

Serialized  class  targets  undergo  additional  processing.  Every  serialized  class  target  must 
implement  the  j  ava  .  io  .  Serializable  interface,  or  it  will  be  ignored.  Also,  for 
every  field  of  a  serialized  class  which  is  not  marked  transient,  the  field’s  declared  class 
is  added  as  a  serialized  class  target.  (This  is  because  Java  serialization  automatically 
serializes  such  fields.)  Similarly,  if  an  array  class  is  serialized,  then  the  array  content  class 
is  also  serialized. 

8.5.5  Creating  The  Specifications 

Writing  reflection  specifications  requires  some  reverse  engineering  of  the  reflection-using 
code.  I  used  a  combination  of  dynamic  and  static  methods.  I  ran  the  example  programs  and 
noted  which  classes  were  loaded  and  which  methods  were  called.  I  also  examined  the 
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bytecode  (and  source  code,  when  available)  and  determined  which  classes  and  methods 
could  be  accessed. 

The  specifications  I  produced  use  two  simplifications  to  reduce  the  number  of  possible 
classes  that  may  be  loaded.  First,  the  character  set  locale  name  is  assumed  to  be  “Cpl252”, 
the  Windows  Latin  character  set.  Secondly,  the  locale  is  assumed  to  be  US  English.  If  all 
available  character  sets  and  locales  are  allowed,  the  very  large  amount  of  code  loaded  to 
support  them  totally  dominates  the  size  of  my  example  programs,  and  most  configurations 
of  SEMI  are  quite  impractical. 

8.5.6  Using  Reflection  Specifications 

Reflective  methods  ultimately  depend  on  foreign  code.  (The  reflective  methods  that  appear 
in  the  Java  library  are  actually  wrappers  around  foreign  methods  that  do  the  real  work.)  I 
have  written  Salamis  specifications  for  those  foreign  methods  that  take  care  of  mundane 
aspects  such  as  throwing  exceptions,  and  delegate  the  essential  reflective  operations  to  a 
special  set  of  foreign  functions.  These  functions  are: 

•  Ref lectionHandler_makeOb j  ectAndCallZeroArgConstructor 

Creates  an  instance  of  some  reflected  class  with  a  constructor  that  takes  no  arguments, 
and  invokes  that  constructor  on  the  object. 

•  Ref lectionHandler_makeOb j  ectAndCallArbitraryConstructor 

Creates  an  instance  of  some  reflected  class  and  invokes  one  of  the  constructors  on  the 
object;  the  parameters  to  the  constructor  are  passed  to  this  function  as  an  array. 

•  Ref lectionHandler_callArbitraryMethod 

Calls  a  reflected  method  on  some  object.  The  parameters  are  passed  into  this  function 
as  an  array. 

•  Ref lectionHandler_makeSerializedObj  ect 

Creates  an  instance  of  a  serialized  non-array  class.  No  constructor  is  invoked. 

•  Ref lectionHandler_makeSerializedArray 

Creates  an  instance  of  a  serialized  array  class. 

•  Ref 1 ect ionHandler_as signs erializ edFie Id 

This  is  actually  a  family  of  functions,  one  per  primitive  type  and  one  for  Ob  j  ect. 
Given  an  object  and  a  value  of  the  appropriate  type,  it  sets  one  of  the  serialized  fields  of 
the  object  to  the  given  value. 

•  Ref 1 ect ionHandler_getS erializ edFie Id 

This  is  actually  a  family  of  functions,  one  per  primitive  type  and  one  for  Ob  j  ect. 
Given  an  object,  it  returns  the  value  of  one  of  the  serialized  fields  of  the  object  with  the 
appropriate  type. 

•  Ref lectionHandler_invoke_readOb j ect 

Given  an  object  which  has  a  private  readOb  j  ect  method  implementing  custom 
serialization  behavior,  this  function  calls  that  method  on  the  object. 
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•  Ref lectionHandler_invoke_writeOb j ect 

Given  an  object  which  has  a  private  write  Object  method  implementing  custom 
serialization  behavior,  this  function  calls  that  method  on  the  object. 

Since  none  of  my  examples  use  reflection  to  modify  object  fields  (other  than  for  serial¬ 
ization),  I  did  not  build  support  for  that  functionality. 

These  functions  cannot  be  specified  statically  in  Salamis  code  because  they  depend  on 
knowing  the  set  of  reflected  classes,  methods,  and  serialized  classes.  Instead,  their  specifi¬ 
cations  are  generated  dynamically.  As  analysis  progresses  and  live  methods  are  discovered, 
they  are  looked  up  in  the  reflection  specification.  Any  induced  reflected  classes,  methods 
or  serialized  classes  are  added  to  a  global  list  of  reflected  entities.  Whenever  this  list  is 
updated,  Ajax  generates  new  specifications  for  the  primitive  reflection  functions.  (Ajax 
analyses  support  code  mutation,  so  they  can  handle  changes  in  the  specifications  even  if  the 
reflection  functions  have  already  been  analyzed.) 

8.6  Conclusions 

Java  programs  have  rich  interactions  with  their  environment.  These  interactions  must  be 
modelled  accurately  to  achieve  sound  and  accurate  analysis.  Unfortunately,  this  is  very 
difficult  to  do;  the  details  of  the  environment  are  inaccessible,  incomprehensible,  and 
subject  to  change.  Even  worse,  the  environment  provides  reflection  facilities  allowing  Java 
programs  to  modify  their  own  behavior  in  ways  that  are  opaque  to  static  analysis. 

Ajax  addresses  these  concerns  by  providing  ways  to  specify  the  environment  and  a 
program’s  reflective  behavior.  These  mechanisms  work,  but  they  can  be  laborious  for  both 
the  tool  implementor  and  user.  More  seriously,  any  attempt  to  specify  the  environment  and 
reflective  behavior  seems  doomed  to  be  fragile,  for  the  reasons  explained  above. 

Although  these  concerns  can  be  tightly  constrained  or  eliminated  in  some  domains  (e.g., 
embedded  systems),  general  purpose  systems  design  is  moving  in  the  direction  of  more  of 
these  kinds  of  problems.  Distributed  systems,  dynamism  and  introspection  are  increasingly 
likely  to  be  the  norm.  Even  embedded  systems  are  increasingly  likely  to  be  attached  to 
networks  and  to  exhibit  these  features  —  for  example,  the  Jini  “smart  devices”  framework 
depends  on  them.  Static  analysis  cannot  ignore  this  challenge. 
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9  Performance 


9.1  Introduction 

This  chapter  describes  the  resource  consumption  and  accuracy  of  the  basic  analyses 
RTA++  and  SEMI  for  some  simple  applications:  resolving  virtual  method  calls  and  identi¬ 
fying  each  program’s  live  code.  The  focus  is  on  measured  performance  rather  than 
theoretical  estimates  or  bounds,  because  performance  depends  crucially  on  the  character¬ 
istics  of  the  programs  being  analyzed. 

The  results  report  accuracy  in  terms  of  application  metrics  (e.g.,  the  number  of  virtual  call 
sites  successfully  resolved  to  a  single  callee).  Metrics  internal  to  an  analysis  algorithm  (e.g., 
the  average  size  of  points-to  sets)  can  be  useful  for  diagnosing  the  behavior  of  a  particular 
algorithm,  but  are  not  as  useful  for  comparing  different  analysis  algorithms. 

Before  I  describe  the  performance  of  the  algorithms,  I  describe  the  suite  of  example 
programs  and  the  test  setup.  It  is  difficult  to  measure  the  sizes  of  the  programs,  partly 
because  it  is  difficult  to  describe  precisely  what  code  constitutes  each  program.  This  is 
interesting  because  it  also  makes  whole -program  static  analysis  hard. 

One  goal  of  this  thesis  was  to  test  the  scalability  of  SEMI-style  analysis  applied  to  Java 
programs.  My  results  show  that  treating  methods  as  functions  passed  around  in  records 
imposes  a  significant  penalty,  and  prevents  the  largest  examples  from  being  treated  within 
the  resource  limits  I  have  set.  However,  this  treatment  can  handle  some  large  and  inter¬ 
esting  programs,  including  the  Ajax  system  itself  with  all  the  libraries  on  which  it  depends. 

Ajax  has  many  tunable  parameters  that  can  alter  the  accuracy  and  resource  consumption  of 
the  sytem.  In  my  results  here,  and  in  subsequent  chapters,  I  focus  on  proving  or  disproving 
specific  hypotheses  rather  than  attempting  to  characterize  completely  the  performance  of 
the  system  in  all  possible  configurations. 

9.2  Benchmark  Environment 

9.2.1  System 

Table  9-1  gives  the  specifications  of  the  machine  running  the  test. 

9.2.2  Benchmark  Examples 

I  use  a  suite  of  ten  benchmark  programs,  described  in  Table  9-2.  Each  program  is  analyzed 
in  conjunction  with  the  libraries  provided  in  Sun’s  JDK  1.1.7.  These  programs  cover  a 
range  of  sizes  and  programming  styles. 
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CPU 

500MHz  Pentium  II 

RAM 

256MB 

Swap  Space 

600MB 

Java  VM 

Sun  JDK  1.3.0,  Hotspot  Client  VM 

Java  Heap  Size 

192MB 

Operating  System 

Windows  NT  4.0,  Service  Pack  5 

Table  9-1.  Environment  specifications 


Program  Name 

Description 

Ajax 

The  downcast  checking  tool  of  my  analysis  system 

CTAS 

The  Connection  Manager  for  a  prototype  air  traffic  control  system, 
in  a  test  harness,  from  Daniel  Jackson’s  group  at  MIT  [43] 

Jar 

The  JAR  compressed  archive  manager  from  Sun’s  JDK  1.1.7 

Java2HTML 

Converts  Java  source  code  to  pretty  HTML,  from  Rustan  Leino  at 
DEC/Compaq  SRC 

JavaC 

The  Java  source-to-bytecode  compiler  from  Sun’s  JDK  1.1.7 

JavaCC 

The  Java  Compiler  Compiler  from  Sun  Labs,  version  0.8prel 
(similar  to  Yacc) 

JavaFIG 

The  JavaFIG  1.3.4  drawing  editor  from  Universitaet  Hamburg 

JavaP 

The  Java  bytecode  disassembler  supplied  with  Sun’s  JDK  1.1.7 

Jess 

Java  Expert  System  Shell  version  4.4,  from  Sandia  National  Labs 
[35] 

Ladybug 

The  Ladybug  specification  checker,  by  Craig  Damon  at  CMU  [44] 

Table  9-2.  The  example  programs 


Table  9-3  records  the  program  sizes.  Measuring  the  size  of  a  program  in  this  context  is 
perplexing.  The  first  difficulty  is  that  only  four  of  the  programs  —  Ajax,  CTAS,  Jess  and 
Java2HTML  —  come  with  complete  source  code,  so  measures  such  as  “lines  of  code”  are 
inapplicable. 

More  seriously,  for  each  example,  the  code  actually  analyzed  is  neither  a  superset  nor  a 
subset  of  the  code  comprising  the  “application.”  (By  “application,”  I  mean  a  body  of  code 
that  one  downloads  and  installs  as  a  unit.)  In  most  cases  the  analyzed  code  is  much  larger 
than  the  application  code,  because  Ajax  analyzes  all  libraries  on  which  the  application 
depends,  as  well  as  the  application  itself.  On  the  other  hand,  Ajax  only  analyzes  the  code 
that  it  detects  to  be  live.  Some  applications,  such  as  Ajax  and  JavaFIG,  consist  of  several 
independently  runnable  programs;  therefore,  whichever  program  is  analyzed,  a  significant 
amount  of  the  application  code  falls  outside  the  program.  For  Jar,  JavaP  and  JavaC  there  is 
no  clear  boundary  between  the  application  and  the  JDK  libraries,  and  the  separation  into 
application  and  library  code  is  somewhat  arbitrary. 
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Name 

App. 

Source 

Lines 

App. 

Classes 

App. 

Methods 

App. 

Bytecode 

Bytes 

Total  Live 

Classes 

Total  Live 
Methods 

Total  Live 

Bytecode 

Bytes 

Ajax 

45,086 

505 

3,145 

171,237 

537 

3,463 

197,398 

CTAS 

6,909 

60 

365 

17,350 

283 

1,527 

86,523 

Jar 

N/A 

8 

85 

6,142 

304 

1,752 

104,979 

Java2HTML 

543 

5 

32 

2,498 

101 

388 

12,316 

JavaC 

N/A 

122 

948 

68,859 

417 

2,817 

192,528 

JavaCC 

N/A 

134 

1,975 

250,653 

161 

1,322 

170,741 

JavaFIG 

N/A 

175 

2,139 

170,655 

496 

3,902 

250,725 

JavaP 

N/A 

58 

577 

52,215 

143 

705 

32,026 

Jess 

36,366 

173 

821 

51,468 

383 

1,854 

110,526 

Ladybug 

-57,000 

389 

3,109 

238,755 

731 

5,277 

346,491 

Table  9-3.  Size  statistics  for  the  example  programs 


Some  features  of  the  example  programs  skew  these  statistics.  Ajax  and  JavaCC  contain 
JavaCC-generated  code,  although  Ajax’s  generated  code  is  not  actually  analyzed.  Ladybug 
contains  code  generated  by  a  different  parser  generator,  JavaCUP.  Thus,  the  characteristics 
of  these  programs  are  partly  determined  by  the  design  of  the  parser  generator.  These 
characteristics  may  be  different  to  the  characteristics  of  “handwritten”  code,  but  it  is 
important  and  interesting  to  examine  both  handwritten  and  machine  generated  code. 

Another  problem  is  that  static  “class  initializer”  methods  are  often  unlike  other  methods  in 
the  program.  The  Java  bytecode  format  has  no  way  to  represent  an  initialized  array; 
therefore  all  constant  arrays  are  constructed  at  run  time  within  the  class’  static  initializer. 
Usually  at  least  five  bytes  of  bytecode  instructions  are  required  per  array  element.  Thus, 
many  class  initializer  methods  are  huge  compared  to  other  methods,  and  in  some  programs 
they  dominate  the  overall  bytecode  instruction  count.  All  results  in  this  thesis  exclude  static 
class  initializer  methods  from  statistics  about  methods.  In  particular,  the  method  counts  and 
bytecode  byte  counts  in  Table  9-3  exclude  static  class  initializer  methods.  This  does  mean 
that  some  legitimate  code  is  excluded  from  the  reports,  but  it  improves  the  meaningfulness 
of  the  results  overall.  These  omissions  are  only  in  the  reporting  of  results  —  the  analyses 
take  the  behavior  of  the  static  class  initializers  fully  into  account. 

In  Table  9-3,  the  “Total  Live  Classes”  number  is  simply  the  number  of  classes  containing 
at  least  one  method  body  which  Ajax  determines  to  be  live.  The  “Total  Live  Methods” 
records  the  number  of  method  bodies  determined  to  be  live  (excluding  static  class  initial¬ 
izers),  and  the  “Total  Live  Bytecode  Bytes”  is  the  sum  of  the  sizes  of  those  methods.  Here 
the  set  of  live  methods  was  computed  using  the  “RTA++”  analysis.  (Other  analyses 
compute  smaller  sets  of  live  methods.) 

JavaFIG  and  Ladybug  are  the  only  two  applications  that  use  the  AWT  user  interface  library, 
and  that  library  accounts  for  much  of  the  code  that  is  pulled  in  from  outside  the  application. 
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Figure  9-1  shows  the  size  of  each  example  program,  as  the  number  of  live  methods.  Figures 
9-2  and  9-3  show  that  the  number  of  live  methods  is  a  reasonably  good  measure  of  program 
size,  being  well  correlated  with  the  number  of  classes  and  number  of  bytes  of  bytecode 
instructions  for  each  program.  This  correlation  is  improved  by  the  fact  that  the  programs 
share  a  great  deal  of  code  (the  JDK  libraries). 


Figure  9-1.  Example  program  sizes 

Figure  9-4  shows  that,  considering  only  code  outside  the  JDK  library,  the  correlation 
between  bytecode  bytes  and  number  of  methods  is  still  nearly  linear,  except  that  Ajax  has 
unusually  small  methods  and  JavaCC  has  unusually  large  ones. 

Figure  9-5  shows  that  for  application  code,  the  number  of  methods  per  class  varies  greatly. 

9.3  Tools 

In  this  chapter,  I  consider  two  tools:  virtual  method  call  resolution  and  live  code  identifi¬ 
cation.  Other  tools  and  their  performance  are  discussed  in  later  chapters.  Here  I  focus  on 
comparing  the  performance  of  different  algorithms  and  configurations. 

9.3.1  Virtual  Call  Resolution 

Virtual  call  resolution  is  the  problem  of  determining,  for  each  virtual  method  invocation 
site,  a  superset  of  the  actual  method  bodies  that  may  be  invoked  by  the  call.  This  chapter 
examines  the  performance  of  the  virtual  call  resolution  technique  described  in 
Section  4.3.4. 
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Class  Count 


Figure  9-2.  Correlation  between  number  of  methods  and  number  of  classes 


Figure  9-3.  Correlation  between  bytecode  bytes  and  number  of  methods 


The  virtual  call  resolution  tool  scans  each  live  method  found  by  the  analysis  and  identifies 
the  occurrences  of  invokevirtual  and  invokeinterf  ace  instructions.  Each  such 


Figure  9-4.  Correlation  between  bytecode  bytes  and  number  of  methods,  for  application  code 


Figure  9-5.  Correlation  between  number  of  methods  and  number  of  classes,  for  application  code 
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instruction  is  considered  a  “virtual  method  invocation  site”,  unless  the  callee  method  is 
declared  final  or  its  declaring  class  is  final,  in  which  case  it  is  ignored  (being  trivial 
to  resolve  statically).  For  each  site,  the  tool  collects  and  outputs  the  set  of  possible  callee 
method  implementations.  Section  4.3.4  describes  how  sets  with  more  than  one  element  are 
abstracted  to  a  single  “many”  value.  In  the  implementation,  the  threshold  is  configurable; 
the  entire  set  of  possible  callees  can  be  retrieved  by  setting  it  to  a  large  integer. 

Note  that  calls  to  private  methods,  constructors,  static  methods,  and  superclass 
methods  (via  super)  all  use  the  invokes  tat  ic  or  invokespecial  instructions  and 
so  are  ignored  by  the  virtual  call  resolver. 

The  tool  summarizes  its  results  by  reporting  three  numbers: 

•  The  number  of  virtual  method  invocation  sites  found. 

•  The  number  of  sites  resolved,  i.e.,  the  number  of  sites  with  zero  or  one  possible  callees. 

•  The  number  of  sites  dead,  i.e.,  the  number  of  sites  with  zero  callees.  A  dead  site  is 
either  never  executed  or  else,  whenever  it  is  executed,  the  object  reference  used  for  dis¬ 
patch  is  always  null  (and  therefore  an  exception  is  thrown). 

The  key  accuracy  metric  is  the  ratio  of  the  first  two  numbers:  the  percentage  of  sites 
resolved. 

As  discussed  above,  because  of  the  frequently  anomalous  nature  of  class  initializer 
methods,  sites  within  class  initializer  methods  are  not  included  in  the  statistics.1 

9.3.2  Live  Code  Identification 

Live  code  identification  is  the  task  of  determining  a  set  of  method  bodies  that  is  a  superset 
of  the  actual  method  bodies  that  may  be  executed  by  the  program.  (Alternatively,  it  can  be 
thought  of  as  the  task  of  determining  a  set  of  method  bodies  that  are  guaranteed  never  to  be 
executed  by  the  program.)  This  chapter  benchmarks  the  VPR-based  technique  described  in 
Section  4.3.5. 

The  tool  summarizes  its  results  by  reporting  two  numbers: 

•  The  number  of  dead  method  bodies  found  in  the  application  code 

•  The  total  number  of  method  bodies  found  in  the  application  code 

The  ratio  of  these  two  numbers  is  the  key  accuracy  metric  here:  the  percentage  of  methods 
in  the  application  found  to  be  dead. 

Class  initializer  methods  are  counted  in  these  statistics  because  they  cannot  significantly 
skew  the  results. 

The  results  for  this  task  do  not  vary  much  across  analyses.  A  simple  analysis  such  as  RTA 
seems  to  get  close  to  the  “true”  set  of  live  methods,  so  there  is  little  room  for  improvement. 


1.  One  example  is  the  class  initializer  for  the  class  sun .  io  .  CharacterEncoding,  which  contains  411 
virtual  calls  to  Hashtable  .  put.  This  would  account  for  more  than  half  of  the  virtual  call  sites  in  some 
examples. 
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9.4  Performance  of  RTA++ 

Figure  9-6  shows  the  memory  required  for  Ajax  to  analyze  the  example  programs  with 
RTA++  for  the  two  tasks  of  virtual  method  call  resolution  and  live  code  identification. 
Figure  9-7  shows  the  time  taken.  RTA++  is  fast  in  each  case.  The  two  tasks  have  similar 
resource  requirements. 


Figure  9-6.  Memory  consumption  of  RTA++ 


The  quality  of  the  RTA++  results  is  presented  later,  in  comparison  with  the  results  for 
SEMI. 

9.5  Performance  of  SEMI 

9.5.1  Overview 

Figure  9-8  shows  the  amount  of  memory  used  by  SEMI  in  a  “high  accuracy”  configuration, 
for  both  the  virtual  call  resolution  and  live  code  identification  tasks.  Figure  9-9  shows  the 
time  taken.  The  missing  bars  indicate  that  the  analysis  did  not  terminate  within  three  hours. 

All  configurations  of  SEMI  presented  in  this  chapter  use  RTA++  to  resolve  virtual  method 
invocations  where  possible  before  applying  SEMI  (see  Section  7.8.1).  In  this  “high 
accuracy”  configuration,  SEMI  performs  precise  analysis  for  the  remaining  virtual  method 
calls  but  turns  off  full  polymorphic  recursion;  this  decision  is  explained  below. 

These  results  also  show  that  using  SEMI,  differences  in  the  resource  requirements  of 
the  two  tools  are  more  pronounced.  The  reason  is  that  the  tool-specific  data  are  propa¬ 
gated  over  much  larger  graphs  for  SEMI  than  for  RTA++. 
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Example  Program 


Figure  9-8.  Space  consumption  of  SEMI  configured  for  high  accuracy 


□  LiveMethodDetector  ■  VirtualCallResolver 


Figure  9-9.  Time  consumption  of  SEMI  configured  for  high  accuracy 

9.5.2  Performance  of  SEMI  in  Different  Configurations 

Now  I  consider  configuring  SEMI  for  reduced  accuracy  but  greater  efficiency.  Figure  9-10 
shows  the  memory  consumption  for  live  method  detection  using  all  combinations  of  the 
PolyRec  and  HighOrder  options.  Figure  9-11  shows  the  time  used. 

•  When  PolyRec  is  enabled,  full  polymorphic  recursion  is  used.  Otherwise  polymorphic 
recursion  is  mostly  suppressed  (see  Section  7.3.6). 

•  When  HighOrcler  is  enabled,  virtual  method  calls  are  analyzed  by  the  precise  tech¬ 
niques  described  in  Chapter  6,  otherwise  the  program  is  treated  as  first-order  by  SEME 
using  RTA++  to  compute  all  the  possible  callees  of  each  virtual  call  site  (see 
Section  7.11). 

The  technique  described  in  Section  7.11  for  transforming  the  programs  to  first-order 
code  significantly  reduces  the  resource  usage,  making  some  large  examples  tractable 
that  were  previously  intractable.  Abandoning  full  polymorphic  recursion  reduces 
resource  requirements  with  HighOrder  enabled,  but  gives  mixed  results  with 
HighOrder  disabled. 

9.5.3  Accuracy  of  SEMI  in  Different  Configurations 

The  settings  of  the  PolyRec  and  HighOrder  options  affect  the  accuracy  of  the  analysis. 
Figure  9-12  shows  results  for  live  method  detection.  Figure  9-13  shows  results  for  virtual 
call  resolution. 
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ime  (s) 


Figure  9-10.  Space  consumption  of  SEMI  in  four  configurations,  for  live  method  detection 


Figure  9-11.  Time  consumption  of  different  SEMI  configurations,  for  live  method  detection 


Virtual  Call  Sites  Resolved  Dead  Methods  Found 


Example  Program 


□  None  BPolyRec  nHighOrder  □  HighOrder+PolyRec 


Figure  9-12.  Accuracy  of  SEMI  configurations  for  live  method  detection 


Example  Program 


□  None  BPolyRec  nHighOrder  □  HighOrder+PolyRec 


Figure  9-13.  Accuracy  of  SEMI  configurations  for  virtual  method  call  resolution 


A  large  number  of  dead  methods  are  found  in  the  application  code  of  Ajax,  CTAS,  Jar, 
JavaCC,  JavaFIG  and  JavaP.  In  these  examples,  the  “application  code”  actually  comprises 
several  different  programs,  only  one  of  which  is  analyzed  by  Ajax. 

The  results  for  virtual  call  resolution  show  a  slight  anomaly:  turning  off  full  polymorphic 
recursion  actually  improves  accuracy  for  Jess.  Normally,  restricting  polymorphic  recursion 
can  only  decrease  accuracy.  In  this  case,  slight  variations  in  the  order  of  constraint 
processing  determine  whether  calls  to  System .  err  .  print  In  are  resolved  or  not. 

Restricting  polymorphic  recursion  does  not  significantly  affect  accuracy  for  either 
live  method  detection  or  virtual  call  resolution. 

Different  SEMI  configurations  produce  little  variation  in  the  results  for  live  method 
detection. 

For  virtual  call  resolution,  enabling  HighOrder  significantly  improves  accuracy. 

Many  virtual  method  call  sites  do  have  more  than  one  possible  callee,  so  even  an  oracle 
would  resolve  fewer  than  100%  of  virtual  call  sites.  Therefore,  an  improvement  from  (for 
example)  88%  to  89%  of  call  sites  resolved  is  significant,  as  it  should  be  considered  a 
reduction  of  at  least  10%  in  the  number  of  resolvable  but  unresolved  call  sites. 

Using  HighOrder  never  decreases  accuracy  in  practice.  Section  7.11  explains  why  this 
might  not  necessarily  be  so. 

9.5.4  Component  Partitioning  in  SEMI 

In  Section  7.9.1, 1  claimed  that  component  partitioning  improved  the  performance  of 
SEMI,  in  particular  when  object  field  components  were  partitioned  according  to  the 
declaring  class  of  each  field.  Figure  9-14  shows  the  memory  consumption  of  three  different 
configurations  of  SEMI  applied  to  the  live  method  detection  problem.  Figure  9-15  shows 
the  time  consumption.  The  configurations  all  use  PolyRec  but  not  HighOrder,  and  each 
configuration  uses  a  different  partitioning  scheme. 

Clearly,  “by  class”  uses  about  the  same  amount  of  memory  as  having  no  partitioning.  “By 
hierarchy”  (see  Section  7.9)  uses  substantially  more  in  most  cases.  Furthermore,  “by 
hierarchy”  is  often  much  slower  and  “by  class”  is  usually  fastest,  sometimes  significantly 
faster  than  “none”. 

These  results  verify  the  claim  that  partitioning  object  field  components  according  to  the 
declaring  class  of  each  field  is  a  good  idea. 

9.6  RTA++  and  SEMI  Intersection 

9.6.1  Basic  Results 

Ajax  can  be  configured  to  compute  the  intersection  of  the  results  of  two  analyses,  and  the 
result  is  guaranteed  to  be  at  least  as  accurate  as  each  analysis  applied  separately.  Because 
RTA++  is  cheap,  intersecting  it  with  SEMI  is  not  much  more  expensive  than  running  SEMI 
alone.  The  resulting  analysis  is  denoted  “SEMI  &  RTA++”. 

Figure  9-17  compares  the  accuracy  of  SEMI  &  RTA++,  SEMI,  and  RTA++,  using  neither 
HighOrder  nor  PolyRec,  for  virtual  call  resolution.  The  results  show  that  SEMI  & 
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Example  Program 
□  None  ■  By  class  □  By  hierarchy 


Figure  9-14.  Memory  consumption  for  different  component  partitioning  schemes 


Figure  9-15.  Time  consumption  for  different  component  partitioning  schemes 


RTA++  is  significantly  more  accurate  than  SEMI  for  this  task,  and  SEMI  is  usually 
more  accurate  than  RTA++. 


RTA++  improves  on  SEMI  because  RTA++  can  use  information  about  downcasts  that 
SEMI  ignores.  For  example,  consider  the  code  in  Figure  9-16.  SEMI  cannot  accurately 
encode  the  downcast  in  the  type  system;  downcasts  are  treated  as  identity  functions. 
Therefore  SEMI  infers  the  same  type  for  s,  i,  the  contents  of  v,  and  s2,  and  SEMI 
concludes  that  s2  and  i  may  be  aliased.  However,  using  the  Java  type  information  with 
RTA++,  it  is  clear  that  s2  and  i  are  not  aliased. 


void  myMethod (Vector  v.  String  s.  Integer  i)  { 
v .  addE lenient  (...  ?  s  :  i); 

if  (...)  { 

String  s2  =  ( String) v . elementAt ( 0 ) ; 

} 

} 

Figure  9-16.  Example  Of  RTA++  Improving  SEMI 

Figure  9-18  gives  the  same  results  for  live  method  detection.  This  task  has  the  same 
pattern  as  virtual  call  resolution  but,  as  before,  the  differences  are  much  smaller,  v 
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Example  Program 


TA  □  SEMI  □  SEMI  &  TA 


Figure  9-17.  Accuracy  of  three  different  analyses  for  virtual  call  resolution 


Figure  9-19  gives  the  time  used  for  virtual  call  resolution,  for  the  three  analysis.  Figure  9- 
20  gives  the  space  consumed.  SEMI  &  RTA++  is  not  much  more  expensive  than 
running  SEMI  alone. 
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Example  Program 


Figure  9-19.  Time  required  by  three  different  analyses  for  virtual  call  resolution 


Figure  9-20.  Space  required  by  three  different  analyses  for  virtual  call  resolution 

9.6.2  Set  Sizes 

As  discussed  in  Section  4.3.4  and  Section  4.4.5,  the  accuracy  of  an  intersection-based 
analysis  can  depend  on  the  maximum  size  of  the  data  sets  allowed  by  the  set  abstraction 
function.  Figure  9-21  shows  the  results  of  SEMI  &  RTA++  using  different  set  sizes. 

Changing  the  set  size  has  no  practical  effect  on  the  accuracy  of  SEMI  &  RTA++. 

9.7  Summary  of  Ajax  Performance 

9.7.1  Algorithm  Selection 

Based  on  the  results  above,  it  is  clear  that  the  intersection  analysis  SEMI  &  RTA++  is 
preferred  over  SEMI.  It  is  also  clear  that,  for  these  tools,  polymorphic  recursion  can  be 
turned  off  (Section  7.3.6)  with  little  accuracy  penalty.  SEMI’s  handling  of  higher-order 
code  should  be  enabled  if  the  program  being  analyzed  is  not  too  large. 

9.7.2  Summary  Results 

Now  I  compare  the  three  algorithms  RTA++,  SEMI  &  RTA++  with  HighOrder,  and  SEMI 
&  RTA++  without  HighOrder.  Figure  9-22  shows  the  accuracy  results  for  virtual  call 
resolution.  Figure  9-23  shows  the  space  requirements  and  Figure  9-24  shows  the  time  used. 

SEMI  is  far  more  expensive  than  RTA++  for  large  programs,  but  produces  much 
better  results. 
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Example  Program 


□  Set  Size  1  BSet  Size  2  nSet  Size  4  nSet  Size  8 


Figure  9-21.  Effect  of  different  set  sizes  on  virtual  call  resolution  accuracy 


Example  Program 


■  RTA++  nSEMI  &  RTA++  nSEMKHighOrder)  &  RTA++ 


Figure  9-22.  Accuracy  of  the  three  contending  algorithms 

9.7.3  Conclusions 

Clearly,  SEMI  is  not  scalable  enough  to  handle  very  large  programs.  The  limiting  factor  is 
time.  However,  it  does  handle  realistically- sized  programs,  and  it  provides  a  major 
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Example  Program 


improvement  over  RTA  for  resolving  virtual  method  calls.  The  task  of  identifying  dead 
application  code  is  well  solved  by  RTA  and  little  improvement  seems  to  be  possible  there. 
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10  Proving  Downcast  Safety 


10.1  Introduction 

10.1.1  Parametric  Polymorphism  and  Downcasts 

Java  lacks  parametric  polymorphism.  Data  structures  such  as  containers,  which  would  be 
parametrically  polymorphic  if  the  language  permitted,  are  usually  implemented  by 
replacing  the  parameter  type  with  some  “generic”  type  which  is  a  supertype  of  the  possible 
instantiations  of  the  parameter  type.  For  example,  a  Java  container  class  usually  holds  refer¬ 
ences  to  objects  of  class  Ob  j  ect.  Methods  to  insert  objects  into  the  collection  take  a 
parameter  of  class  Object,  and  methods  to  extract  objects  return  a  value  of  class 
Ob  j  ect. 

For  example,  consider  Figure  10-1.  The  class  j  ava  .  util  .Vector  declares  the 
methods  addE  lenient  and  element  At,  among  others.  To  store  and  retrieve  objects  of 
a  particular  known  class,  such  as  String  in  this  case,  one  must  use  downcasts. 


class  Vector  { 

public  Vector ( )  {  ...  } 

public  final  synchronized  void  addElement (Ob j ect  ob j )  {  ...  } 
public  final  synchronized  Object  elementAt ( int  index)  {  ...  } 

} 

static  void  main ( String [ ]  args)  { 

Vector  v  =  new  Vector ( ) ; 
v . addElement (args [  0  ]  )  ; 

String  s  =  ( String) v . elementAt ( 0 )  ; 

} 

Figure  10-1.  Example  of  a  Java  generic  container  requiring  downcasts 

Without  the  downcast  to  String,  the  code  will  not  compile  because  the  result  of 
elementAt  is  not  known  to  be  assignable  to  a  String  object  reference.  The  information 
needed  to  prove  the  assignment  safe  without  the  downcast  would  normally  be  expressed 
using  parametric  polymorphism,  but  cannot  be  expressed  in  Java’s  type  system. 

10.1.2  Using  SEMI  To  Prove  Downcasts  Correct 

SEMI  is  effectively  a  type  inference  system  with  parametric  polymorphism.  SEMI  can 
reconstruct  type  parametricity  information  that  Java’s  type  system  cannot  express.  The 
most  straightforward  application  is  to  prove  that  certain  downcasts  will  always  succeed.  In 
the  example  above,  Ajax  will  prove  that  the  downcast  to  String  always  succeeds.  A 
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compiler  or  run-time  system  could  use  this  information  to  eliminate  run-time  checks 
associated  with  the  downcast.  The  programmer  is  assured  that  the  types  of  elements  in  the 
container  are  consistent  with  expectations. 

The  rest  of  this  chapter  presents  the  design  of  the  Ajax  downcast  checking  tool,  which  is 
simple  given  the  Ajax  infrastructure.  I  present  some  quantitative  results  on  the  efficacy  of 
the  downcast  checker  on  my  example  programs.  These  results  also  include  some  interesting 
comparisons  between  different  analysis  configurations.  I  also  discuss  some  of  the 
especially  interesting  or  problematic  pieces  of  code  in  the  examples.  I  conclude  with  a 
comparison  of  Ajax  downcast  checking  to  support  for  parametric  polymorphism  in  the 
language,  and  a  discussion  of  some  other  similar  ways  to  use  Ajax. 

10.2  The  Downcast  Checking  Tool 

10.2.1  Interface  to  the  VPR 

Section  4.3.3  presents  the  design  of  a  VPR-based  tool  for  proving  downcasts  safe.  The  tool 
selects  a  set  of  occurrences  of  downcast  instructions  for  analysis;  by  default,  it  chooses  all 
the  downcasts  in  the  program  code  found  to  be  live.  Then,  using  the  VPR,  for  each 
downcast  instruction  it  computes  an  upper  bound  in  the  Java  class  hierarchy  for  the  classes 
of  all  objects  that  occur  as  operands  to  the  downcast  instruction.  This  bound  is  compared 
to  the  class  specified  by  the  downcast;  if  the  bound  is  equal  to  or  is  a  subclass  of  the 
specified  class,  the  downcast  is  reported  to  be  safe. 

10.2.2  User  Interface 

The  downcast  checking  tool  is  exceptionally  simple  to  use.  The  user  specifies  the  program 
to  be  analyzed  by  giving  a  “class  path”  and  the  name  of  the  “main”  class.  The  tool  then 
prints  out  a  list  of  all  the  downcasts  that  were  found  in  live  code.  For  each  downcast,  the 
tool  prints  out  the  location  (method  name  and  instruction  offset),  the  class  specified  by  the 
instruction,  the  bound  actually  detected  by  the  analysis,  and  whether  or  not  the  downcast  is 
proven  safe. 

10.3  Quantitative  Results 

10.3.1  Proving  Downcasts  Safe  Using  RTA++ 

Section  5.4  describes  how  RTA  is  extended  with  intraprocedural  flow  analysis  to  track  the 
use  of  instanceof  in  conditional  expressions,  in  order  to  refine  the  type  information 
known  about  variables  at  certain  program  points.  This  information  can  be  used  to  prove  the 
downcast  safe  in  the  common  “typecase”  idiom  in  Java.  For  example,  given  the  code 

if  (x  instanceof  C)  { 

C  c  =  ( C )  x  ; 

} 
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it  is  easy  for  the  Ajax  downcast  checking  tool,  using  RTA++,  to  prove  that  the  downcast  is 
safe.  While  this  technique  has  been  used  by  others  [18],  its  effectiveness  has  not  previously 
been  published. 

Figure  10-2  shows  the  percentage  of  live  downcasts  proven  safe  using  basic  RTA  and  the 
RTA++  extension.  The  results  indicate  that  RTA++  is  effective  for  many  programs.  Note 
that  even  basic  RTA  can  sometimes  prove  a  downcast  safe,  for  example  when  an  abstract 
class  has  only  one  concrete  subclass  and  we  downcast  from  the  abstract  class  to  the 
subclass. 


10.3.2  Proving  Downcasts  Safe  Using  SEMI 

Figure  10-3  shows  the  results  of  using  SEMI  in  its  four  configurations  (with  or  without 
HighOrder  and  PolyRec). 

In  most  cases,  SEMI  alone  is  able  to  prove  more  downcasts  safe  than  RTA++,  although 
we  will  see  below  that  the  downcasts  it  proves  safe  are  different  from  the  ones  RTA++  can 
prove  safe.  As  shown  for  the  tools  in  the  previous  chapter,  unrestricted  polymorphic 
recursion  is  not  helpful  if  HighOrder  is  enabled.  However,  when  HighOrder  is  disabled, 
the  situation  is  different:  unrestricted  polymorphic  recursion  significantly  improves 
downcast  checking. 

10.3.3  Proving  Downcasts  Safe  Using  SEMI  with  RTA++ 

Taking  the  intersection  of  the  information  obtained  by  SEMI  with  that  obtained  by  RTA++, 
as  described  in  Section  4.4.5,  gives  the  best  of  both  worlds.  Figure  10-4  shows  the  results 
of  using  SEMI  &  RTA++  (with  full  polymorphic  recursion)  compared  to  SEMI  or  RTA++ 
alone. 
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)owncasts  Proven  Safe 


One  can  see  that  the  number  of  downcasts  proven  safe  by  SEMI  &  RTA++  is  close  to 
the  sum  of  the  downcasts  proven  safe  by  SEMI  and  RTA++.  This  is  unsurprising.  To  a 
rough  approximation,  RTA++  resolves  downcasts  introduced  because  Java  lacks  sum  types 
(see  Section  5.4.1),  and  SEMI  resolves  downcasts  introduced  because  Java  lacks  type 
parame  tricity. 

There  is  an  oddity  in  the  results  for  the  Java2HTML  example:  SEMI  &  RTA++  obtains  a 
worse  percentage  of  downcasts  proven  safe  than  RTA++  alone.  This  is  because 
Java2HTML  is  a  very  small  program;  RTA++  finds  only  fifteen  live  downcasts  and  proves 
four  of  them  safe,  but  SEMI  &  RTA++  finds  only  thirteen  live  downcasts,  proving  two  of 
them  safe.  That  is,  SEMI  &  RTA++  proved  that  two  of  RTA++’s  safe  downcasts  are 
actually  dead  code,  and  excluded  them  from  its  results. 

10.3.4  Summary 

Figure  10-5  shows  the  overall  results  using  the  best  analyses  available.  The  results  for 
SEMI(HighOrder-t-PolyRec)  &  RTA++  are  almost  identical  to  those  for  SEMI(HighOrder) 
&  RTA++. 


Example  Program 


□  RTA++  nSEMKPolyRec)  &  RTA++  nSEMI(HighOrder)  &  RTA++ 


Figure  10-5.  Overall  results 

For  some  large,  realistic  programs  —  Jar,  JavaCC,  and  JavaP  —  Ajax  is  able  to  prove 
the  safety  of  more  than  50%  of  the  downcasts. 

Unfortunately,  the  accuracy  seems  to  deteriorate  as  programs  get  larger.  Many  fewer 
downcasts  are  resolved  in  JavaC,  JavaFIG  and  Ladybug  than  in  the  other  programs.  From 
these  results,  it  is  hard  to  tell  whether  this  is  because  of  the  kind  of  code  people  write  in 
larger  programs,  or  whether  there  is  some  more  subtle  reason.  Anecdotal  evidence  suggests 
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that  larger  programs  are  more  likely  to  contain  sections  of  “difficult”  code  that  destroy  the 
quality  of  the  analysis  results  in  a  non-local  way.  This  is  discussed  further  below. 

10.4  Unresolvable  Downcasts 

I  have  already  mentioned  the  kind  of  code  for  which  SEMI  &  RTA++  can  prove  downcast 
safety.  In  this  section  I  focus  on  some  negative  examples  —  usage  patterns  for  downcasts 
that  SEMI  &  RTA++  is  unable  to  handle. 

10.4.1  Confusion  Involving  Sum  Types 

A  useful  example  is  Sun’s  Java  disassembler  JavaP.  Analyzed  by  SEMI  &  RTA++  with 
polymorphic  recursion  and  higher-order  treatment,  it  is  found  to  have  38  live  downcasts  of 
which  21  are  proven  safe. 

One  of  the  downcasts  not  proven  safe  is  at  offset  8  in 

sun  .tools,  util.  LoadEnvironment .  getClassDeclaration.  This 
downcast  is  applied  after  extracting  an  object  from  a  Hashtable  containing 
ClassDeclarations.  The  problem  is  that  the  same  Class  Declaration  objects  are 
also  placed  into  a  container  of  general  “constant  pool  items”,  which  include  Strings, 
Integers  and  other  constants.  The  unification  behavior  of  SEMI  leads  it  to  conclude  that 
those  other  constants  may  also  be  present  in  the  Hashtable.  This  is  one  example  of  a 
common  class  of  problems:  the  use  of  sum  types  in  one  context  causes  inaccuracy  in 
another  context.  Most  of  the  failures  to  resolve  downcasts  in  JavaP  can  be  traced  back  to 
this  problem  with  the  “constant  pool”. 

Flow  sensitive  analysis  techniques  could  help  to  reduce  the  damage  caused  by  the  use  of 
such  sums. 

10.4.2  “Out  Of  Band”  Dynamic  Type  Knowledge 

Another  generally  common  problem  that  occurs  in  JavaP  is  the  use  of  special  knowledge 
to  discriminate  sum  types.  For  example,  JavaP  code  often  assumes  that  certain  constant 
pool  items  have  certain  types,  based  on  arithmetic  invariants  governing  indices  into  the 
constant  pool  array  (e.g.,  two  halves  of  a  64-bit  value  are  always  stored  at  consecutive 
locations  in  the  array).  It  then  downcasts  to  the  known  type  without  any  guarding 
instanceof  check. 

Another  example  is  the  method 

sun. tools . j ava . MethodType . equal Arguments (sun. tools. j  ava . Type ) 

The  parameter  is  downcast  to  a  MethodType  without  checking,  because  other  code  estab¬ 
lishes  a  precondition  that  the  parameter  is  indeed  a  MethodType.  Propagating  such 
invariants  interprocedurally  would  require  more  sophisticated  analysis  than  that  provided 
by  Ajax. 
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10.5  Conclusions 


10.5.1  Summary 

The  Ajax  downcast  checking  tool  is  able  to  prove  more  than  half  of  the  downcasts  correct 
for  some  real  programs.  However,  as  programs  get  larger  the  accuracy  decreases.  This 
appears  to  be  because  as  the  program  gets  larger,  there  is  an  increasing  chance  of  encoun¬ 
tering  some  code  idiom  that  pollutes  the  results  for  a  large  fraction  of  the  program.  The  use 
of  sums  is  often  the  culprit. 

10.5.2  Other  Applications 

Proving  the  safety  of  downcasts  could  be  useful  for  Java  run-time  systems  as  well  as 
programmers.  Many  Java  programs  could  be  sped  up  by  eliminating  the  run-time  checks. 

Another  use  of  this  technology  would  be  to  reverse  engineer  type  parametricity  in  existing 
Java  programs,  in  order  to  translate  them  into  a  language  that  supports  parametericity  such 
as  Generic  Java  [13].  It  would  not  be  difficult  to  implement  such  a  tool  based  on  the  tools 
I  have  already  built. 

10.5.3  Limitations  of  Downcast  Checking 

Checking  downcasts  is  not  the  only  use  of  type  parametricity  information,  and  checking 
downcasts  does  not  produce  all  the  benefits  that  a  language  with  parametric  polymorphism 
provides.  For  example,  in  Java  it  is  common  to  implement  a  set  using  aHashtable  where 
objects  are  put  into  the  Hashtable,  and  the  presence  of  keys  is  tested  using  a  method 
returning  a  boolean  value,  but  no  object  extraction  (and  downcasting)  ever  occurs. 
Downcast  checking  will  say  that  everything  is  safe  even  if  all  sorts  of  different  objects  are 
added  to  the  set.  In  a  language  with  parametric  polymorphism,  the  user  could  declare  the 
desired  element  type  and  the  language  would  detect  any  usage  inconsistent  with  the  decla¬ 
ration. 

A  completely  automatic  tool  cannot  detect  such  errors.  Without  user  annotations,  or  at  least 
some  heuristics,  it  is  impossible  to  determine  the  intended  type  parametricity  of  a  data 
structure.  If  such  annotations  were  available,  then  it  would  be  easy  to  design  an  Ajax  tool 
to  check  them. 
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11  Ajax  Object  Models 


11.1  Introduction 

In  this  chapter,  I  describe  what  object  models  mean  in  Ajax,  and  how  Ajax  can  construct 
them.  Then  I  present  examples  taken  from  real  programs,  and  discuss  the  advantages  and 
disadvantages  of  using  Ajax  to  construct  these  object  models. 

11.1.1  Overview  of  Object  Models 

An  object  model  is  a  graph-based  abstraction  of  a  set  of  program  states.  In  this  thesis,  each 
node  represents  a  collection  of  runtime  objects  that  occur  in  the  states.  Edges  represent 
relationships  between  the  collections,  such  as  class  inheritance  and  field  reference. 

For  example,  Figure  11-1  shows  an  object  model  for  the  program  in  Figure  1 1-2.  A  dotted 
edge  indicates  an  inheritance  relationship.  A  solid  line  represents  a  field  edge,  labelled  with 
the  name  of  the  referring  field.  Each  node  is  labelled  with  the  class  name  of  the  objects  it 
represents.  For  example,  from  this  diagram  we  can  see  at  a  glance  that  X  has  two  fields 
referring  to  Y  objects,  some  of  which  may  actually  be  of  class  Z. 


Figure  11-1.  A  class  hierarchy  object  model 


This  object  model  was  obtained  directly  from  the  program’s  class  declarations.  However, 
more  elaborate  object  models  are  possible  and  useful.  For  example.  Figure  11-3  shows 
another  object  model  for  the  same  program.  This  object  model  reveals  more  information, 
such  as  the  fact  that  X’s  yl  and  y2  fields  both  refer  specifically  to  objects  of  class  Y  and 
not  Z.  This  information  cannot  be  obtained  from  the  class  declarations  alone;  different 
objects  of  class  Y  must  be  represented  by  different  nodes. 
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class  X  { 

Y  yl; 

Y  y2  ; 

X()  { 

yl  =  new  Y(this); 

y2  =  new  Y ( . . .  ?  new  Z ( )  :  this ) ; 

} 

static  void  main ( String [ ]  args)  { 

X  x  =  new  X ( ) ; 

} 

} 

class  Y  { 

Object  contents; 

Y (Object  p)  { 

contents  =  p; 

} 

} 

class  Z  extends  Y  { 

String  s  =  "Football"; 

Z  0  { 

super ( s )  ; 

} 

} 

Figure  11-2.  An  example  Java  program 


Figure  11-3.  A  richer  object  model 


An  object  model  is  a  directed  graph.  Each  node  in  the  graph  is  associated  with  a  set  of 
runtime  objects.  There  are  two  kinds  of  edges:  field  edges,  labelled  with  field  names,  and 
inheritance  edges,  which  are  unlabelled.  A  field  edge  from  A  to  B  labelled  F  indicates  that 
at  least  one  of  A’s  objects  has  a  field  F  containing  a  reference  to  an  object  in  B.  An  inher¬ 
itance  edge  from  A  to  B  indicates  that  B’s  objects  are  a  subset  of  A’s  objects. 
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The  class  hierarchy  of  a  Java  program  can  be  interpreted  as  an  object  model.  Each  node 
corresponds  to  a  class  C,  and  is  associated  with  the  set  of  objects  of  class  C  or  some  subclass 
of  C.  Field  edges  are  drawn  from  C’s  node  to  the  nodes  corresponding  to  the  declared  class 
types  of  the  object  reference  fields  declared  in  C.  Inheritance  edges  are  drawn  from  each 
class  to  its  subclasses. 

Object  models  visualize  the  structure  of  a  program’s  data.  In  object-oriented  programs,  the 
structure  of  the  data  reflects  the  overall  organization  of  the  program.  Programmers  can  use 
object  models  to  capture  this  organization  graphically. 

An  object  model  can  be  thought  of  as  a  static  projection  of  all  possible  runtime  heap  states 
of  a  program. 

11.1.2  A  Definition  of  Object  Models 

The  following  definition  is  as  flexible  as  possible  to  accommodate  various  ideas  about  what 
an  object  model  is,  how  it  can  be  constructed,  and  how  it  can  be  used. 

The  class  hierarchy  object  model  has  the  following  properties: 

1.  The  field  edges  are  sound;  field  relationships  in  all  program  states  are  reflected  in  the 
model.  Formally,  if  in  some  program  state  an  object  O  l  has  a  field  F  containing  a  refer¬ 
ence  to  object  02,  and  O  |  and  02  are  represented  in  the  model  (i.e.,  they  are  associated 
with  at  least  one  node),  then  there  are  nodes  A  and  B  and  a  field  edge  from  A  to  B 
labelled  F  such  that  0[  is  associated  with  A  and  02  is  associated  with  B. 

For  example,  in  Figure  1 1-3,  in  the  final  program  state,  x  .  y2  refers  to  an  object  associ¬ 
ated  with  the  Y'  '  node.  Since  the  object  x  is  associated  with  node  X,  an  edge  labelled 
y2  must  be  drawn  from  node  X  (or  node  Ob  j  ect)  to  node  Y'  ' . 

2.  Inheritance  edges  obey  the  subset  relationship:  if  is  associated  with  node  A,  and 
there  is  an  inheritance  edge  from  A  to  B,  then  Oj  is  associated  with  node  B. 

In  Figure  11-3,  all  objects  associated  with  node  Z  must  also  be  associated  with  the  Y 
node  and  the  Ob  j  ect  node. 

3.  Every  object  has  a  “most  specific”  node:  if  O  is  associated  with  nodes  A  and  B,  then 
there  is  a  node  C  such  that  O  is  associated  with  C  and  there  is  a  path  in  the  inheritance 
edges  from  A  to  C  and  from  B  to  C 

The  most  specific  node  for  x  in  the  example  is  the  node  labelled  X.  There  is  a  path  from 
the  other  node  associated  with  x  (Ob  j  ect)  to  the  most  specific  node. 

4.  If  there  is  a  field  edge  E  from  A  to  B  labelled  F,  and  a  node  C  such  that  there  is  a  path  in 
the  inheritance  edges  from  C  to  A,  and  C  has  an  outgoing  field  edge  labelled  F,  then  A 
equals  C  and  that  edge  is  E  itself. 

For  example,  it  would  not  be  permissible  to  have  an  edge  emanating  from  node  Y 
labelled  s,  unless  the  s-edge  emanating  from  node  Z  was  deleted. 


233 


We  take  these  properties  as  definitional,  and  call  any  graph  satisfying  them  an  object 
model.  Property  1  is  useful  because  it  assigns  meaning  to  the  field  edges  of  the  graph  — 
more  precisely,  it  assigns  meaning  to  the  absence  of  field  edges  in  the  graph.  Properties  2 
and  3  impose  structure  on  the  associations  between  nodes  and  objects;  in  particular 
property  3  means  that  given  a  map  from  each  object  to  its  “most  specific”  node  (e.g.,  its 
class),  we  can  find  all  the  nodes  associated  with  any  given  object.  Property  4  guarantees 
that  each  field  of  an  object  maps  to  at  most  one  edge  in  the  model. 

The  class  hierarchy  model  has  the  following  additional  “completeness”  properties: 

5.  Objects  are  complete:  given  an  object  O  |  containing  a  field  F,  a  node  A  such  that  0[  is 
associated  with  node  A,  and  an  object  02  such  that  Oj.F  =  02,  then  for  some  node  B 
there  is  an  edge  in  the  model  from  A  to  B  labelled  F. 

6.  All  objects  are  included:  given  an  object  Oj,  there  is  a  node  A  such  that  O  |  is  associated 
with  node  A. 

A  useful  object  model  need  not  satisfy  these  properties.  The  object  models  created  by  Ajax 
satisfy  property  5  but  not  property  6. 

11.2  Computing  Object  Models  with  Ajax 

Ajax  includes  an  object  modelling  tool  based  on  the  VPR.  Building  object  models  requires 
extensive  post-processing  of  the  raw  value-point  relation.  This  section  describes  this 
processing,  first  giving  the  series  of  steps  required,  and  then  elaborating  on  the  difficult 
steps. 

11.2.1  Overview 

Previous  work  on  object  model  construction  [46]  starts  with  a  class  hierarchy  and  applies 
transformations  to  obtain  more  refined  models.  In  contrast,  Ajax  builds  a  refined  object 
model  and  then  applies  transformations  to  simplify  the  model. 

•  Ajax  first  constructs  a  simple  model  that  uses  no  inheritance  edges  and  does  not  obey 
property  4  (unique  field  edges).  The  model  associates  each  object  with  at  most  one 
node.  This  model  is  simply  a  conservative  static  approximation  to  the  heap  graph 
reachable  from  a  given  set  of  “root  objects”,  specified  by  bytecode  expressions  pro¬ 
vided  by  the  user.  Property  5  (“object  completeness”)  is  obeyed,  but  not  property  6 
(because  not  all  objects  are  included).  The  construction  of  this  heap  graph  is  described 
in  more  detail  in  Section  11.2.2. 

Figure  11-4  gives  this  basic  model  for  the  program  in  Figure  11-2.  The  root  objects  are 
the  objects  evaluated  to  by  the  expression  x  in  the  main  method.  Note  that  the  node 
“some  other  Y”  has  two  outgoing  edges  labelled  contents,  violating  property  4. 

•  Next,  a  simple  object  model  is  obtained  from  the  heap  graph  by  merging  nodes  in  order 
to  satisfy  property  4.  That  is,  whenever  we  have  a  node  A  with  two  outgoing  field  edges 
labelled  F  to  nodes  B  and  C,  we  merge  nodes  B  and  C  and  delete  one  of  the  field  edges. 

In  the  example,  Ajax  merges  the  X  and  Z  nodes;  see  Figure  11-5. 


234 


Figure  11-4.  Ajax  heap  graph 


Figure  11-5.  Ajax  heap  graph  with  unique  field  edges  (simple  object  model) 


•  In  the  next  pass,  each  node  explodes  into  a  set  of  subnodes,  one  for  each  class  of  objects 
associated  with  the  node  and  one  for  each  of  their  superclasses.  An  inheritance  edge  is 
introduced  between  each  class  and  its  superclass.  The  origin  of  each  field  edge  is  set  to 
the  subnode  for  the  class  in  which  the  field  is  declared.  The  target  is  the  subnode  of  the 
original  target  node  for  the  class  the  field  is  declared  as. 

See  Figure  11-6.  The  rounded  boxes  group  the  subnodes  extracted  from  each  original 
node.  For  example,  the  node  “some  Z,  some  X”  is  exploded  into  four  nodes:  one  for 
class  Z,  one  for  class  X,  and  one  each  for  their  superclasses  Y  and  Ob  j  ect.  The  edge 
for  field  yl  has  its  origin  at  the  subnode  for  X,  because  field  yl  is  declared  in  class  X. 
The  edge  points  to  the  subnode  for  class  Y  because  yl  is  declared  as  class  Y. 

•  Sometimes  the  target  of  a  field  edge  is  known  to  be  of  a  more  specific  class  than  the 
declared  class.  (This  information  is  obtained  by  a  separate  Ajax  query  to  compute  the 
most  specific  common  superclass  of  the  target  objects.)  The  field  edge  is  retargeted  to 
the  more  specific  class. 

For  example,  in  Figure  11-6,  Z’s  field  contents  is  known  to  contain  only  Strings. 
The  edge  is  updated  to  point  to  the  String  node. 
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Figure  11-6.  Ajax  object  model  with  classes  and  inheritance 

•  In  Figure  11-6,  three  of  the  Object  nodes  are  not  useful  because  the  only  edges  inci¬ 
dent  to  them  are  outgoing  inheritance  edges.  All  such  nodes  are  deleted,  giving 
Figure  11-7.  Since  this  can  create  more  nodes  incident  only  to  outgoing  inheritance 
edges,  the  operation  is  repeated  until  no  applicable  nodes  remain.  Other  pruning  can 
also  be  performed  at  this  stage;  this  is  discussed  in  more  detail  in  Section  11.2.3. 


Figure  11-7.  Ajax  object  model  with  superclass  suppression 
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In  a  final  (optional)  pass,  Ajax  identifies  isomorphic  subgraphs  within  the  model  and 
merges  them  to  save  space.  Figure  11-7  does  not  contain  any  isomorphic  subgraphs; 
therefore  it  is  the  graph  produced  by  Ajax  for  the  example  program.  This  is  the  same  model 
shown  in  Figure  11-3. 

11.2.2  Computing  Heap  Graphs  With  The  VPR 

The  first  step  is  to  construct  a  heap  graph.  Clearly  the  VPR  is  not  a  natural  encoding  of  a 
heap  graph;  we  must  extract  a  heap  graph  using  Ajax  queries. 

11.2.2.1  Approach 

Suppose  a  “root  expression”  exp  is  given.  This  expression  can  be  chosen  by  the  user  as 
described  in  Section  11.2.4. 

Ajax  constructs  a  heap  graph  with  a  root  node  representing  the  objects  to  which  exp 
evaluates.  Then,  for  each  field  name  F  in  the  program,  it  checks  whether  exp .  F  <-»  exp .  F. 
If  not,  then  the  objects  for  the  root  node  never  have  a  field  F,  or  their  F  fields  always  contain 
null.  Otherwise  Ajax  adds  a  field  edge  labelled  F,  emanating  from  the  root  node  and 
pointing  to  a  new  node  —  the  node  representing  objects  evaluated  to  by  “exp. F”.  We  repeat 
this  procedure,  taking  each  new  node  and  adding  outgoing  edges  for  its  fields,  building  a 
tree  representing  the  objects  reachable  from  the  root  objects. 

Many  nodes  in  the  tree  may  correspond  to  overlapping  (or  identical)  sets  of  objects. 
Therefore  we  test,  for  each  pair  of  nodes,  whether  the  expressions  associated  with  the  nodes 
are  related  by  the  value-point  relation.  If  the  expressions  are  related  then  we  merge  the 
nodes.  This  means  that  the  tree  may  become  a  general  graph. 

11.2.2.2  Method 

The  procedure  is  shown  in  Figure  11-8. 

It  is  impractical  to  build  such  a  tree  and  then  subsequently  merge  the  nodes.  The  initial  tree 
is  simply  too  large,  and  in  the  case  of  cyclic  data  structures,  it  may  even  be  infinite.  Instead, 
before  creating  a  new  node  (label  3),  Ajax  checks  to  see  whether  the  node’s  expression  is 
related  to  any  of  the  expressions  associated  with  already  existing  nodes  (label  1).  If  so  then 
the  new  node  need  not  be  created;  the  matching  existing  node  is  used  instead  (label  2). 

11.2.2.3  Correctness 

Using  the  standard  value-point  relation,  the  above  procedure  is  not  sound.  It  assumes  that 
when  two  nodes  are  related  in  the  VPR,  they  have  exactly  the  same  behavior.  More 
precisely,  the  algorithm  above  is  only  correct  if  the  VPR  has  the  substitutability  property. 

Vcj,  e2.  el<rJ>e2^>  (Ve.  (Ve,  F.  C\-F  e  <=>  e^.F  e) 

This  means  that  if  e  |  and  e2  are  related,  substitution  of  one  for  the  other  does  not  change 
whether  an  expression  pair  is  in  the  VPR. 

This  property  is  not  implied  by  the  definition  of  the  VPR.  Consider  the  example  in 
Figure  1 1-9.  According  to  the  VPR,  f :  x  <-»  f  :  y  and  f :  y .  length  <-»  f :  len  .  However, 
substituting  x  for  y,  f:x.length<-»f:len  does  not  hold.  Informally,  the  reason  is  that 
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Initialize  the  graph  G  to  contain  a  single  node,  the  root 
Let  M  be  a  map  from  G' s  nodes  to  expressions 
Initialize  the  map  M  to  map  the  root  node  to  exp 
Repeat  { 

For  each  field  F  in  the  program  { 

For  each  node  N1  in  G  { 

If  M(N1) .F  <->  M(N1) .F  is  in  the  VPR  { 

1:  For  each  node  N2  in  G  { 

If  M(N1) .F  <->  M(N2)  is  in  the  VPR  { 

If  there  is  no  edge  from  N1  to  N2  labelled  F  { 
2 :  Add  to  G  an  edge  from  N1  to  N2  labelled  F 

} 

} 

} 

If  N1  has  no  outgoing  edge  labelled  F  { 

3 :  Create  a  new  node  N 

Extend  M  with  a  mapping  from  N  to  M(N1) .F 
Add  to  G  an  edge  from  N1  to  N  labelled  F 

} 

} 

} 

} 

}  Until  G  does  not  change 

Figure  11-8.  Basic  heap  graph  construction  algorithm 


static  void  f (Object  x.  Object  y,  int  len)  { 

} 

static  void  main ( String [ ]  args)  { 

String []  zoo  =  {  "lion",  "tiger"  }; 
f (zoo,  zoo,  args . length) ; 
f (zoo,  args,  args . length) ; 

} 

Figure  11-9.  Example  of  substitutability  violation 

the  two  antecedent  relation  pairs  hold  in  different  contexts,  so  no  conclusion  can  be  drawn 
from  their  conjunction. 

11.2.2.4  Solution 

Therefore,  the  object  modelling  tool  notifies  the  analysis  that  it  must  produce  a  VPR 
approximation  satisfying  the  substitutability  property.  For  increased  flexibility,  the  tool 
specifies  a  program  point  /  at  which  expressions  must  be  substitutable;  all  other  expressions 
need  not  be  substitutable.  The  exact  property  demanded  is: 

Vcj,  e2.  / : C|  <-»  l\e2  => 

a  (Ve.  l\e  j  <-»  e  <=>  l\e0  <-»  e) 

This  suffices  because  all  queries  required  to  build  the  heap  graph  are  based  on  one  or  more 
root  expressions,  which  are  all  at  the  same  program  point.  Limiting  the  property  to  one 
program  point  means  that  other  queries  using  the  same  VPR  approximation  (e.g.,  the 
liveness  query  used  to  limit  the  scope  of  the  analysis)  are  not  seriously  impacted. 
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11.2.2.5  Implementing  Substitutability  In  RTA++ 

It  is  easy  to  enforce  substitutability  in  RTA++.  We  simply  assign  the  static  bytecode  type 
TOP  to  any  expression  of  the  form  h.e ,  where  /  is  the  program  point  where  substitutability 
is  required.  This  ensures  that  every  such  expression  is  related  to  all  other  expressions  in  the 
computed  VPR. 

This  approximation  is  not  particularly  useful,  because  it  implies  h.ex  <-»  h.e2  regardless  of 
the  values  of  ex  and  e2,  so  using  RTA++  alone,  the  heap  graph  will  collapse  to  a  point. 
Unfortunately  it  is  necessary.  For  suppose  that  for  some  e ,  h.e  has  Java  type  Ob  j  ect. 
(The  existence  of  such  an  e  is  almost  certain  in  practice.)  Then  for  any  ex  and  e2  such  that 
h.ex  and /:e9  have  Java  class  types,  RTA++ will  give  h.e<r^h.ex  and  h.e  <-»  h.e2  .  The 
substitutability  property  then  requires  that  l\ex  <-»  l:e2 . 

Therefore  RTA++  alone  is  not  suitable  as  the  analysis  engine  for  the  Ajax  object  modeling 
tool. 

11.2.2.6  Implementing  Substitutability  In  SEMI 

Suppose  that  h.ex  and  h.e2  both  map  to  SEMI  constraint  variables  that  have  no  instance 
constraints  emanating  from  them.  Then  in  SEMI,  l\ex  <-»  h.e2  if  and  only  if  l\ex  and  h.e2 
map  to  the  same  constraint  variable.  If  indeed  they  map  to  the  same  constraint  variable,  the 
substitutability  property  is  satisfied  for  l\ex  and  l\e2,  because  SEMI’s  VPR  is  a  function 
of  the  constraint  variables  mapped  to  by  the  expressions. 

Therefore,  to  enforce  the  substitutability  property  in  SEMI,  I  force  all  expressions  of  the 
form  h.e  to  have  no  instance  constraints  emanating  from  them,  by  forcing  their  constraint 
variables  to  be  global  (see  Section  7.6.3). 

11.2.2.7  Improving  The  Heap  Graph  Algorithm 

The  algorithm  described  above  is  rather  inefficient.  The  implementation  of  the  object 
modelling  tool  speeds  it  up  by  exploiting  the  power  of  the  Ajax  interface.  The  algorithm  is 
presented  in  Figure  1 1-10. 

The  improved  algorithm  uses  a  series  of  iterations.  It  maintains  a  set  of  “fringe”  nodes,  the 
nodes  added  in  the  last  iteration  (set  T).  At  each  step,  the  fields  of  the  fringe  nodes  are 
examined  and  potential  new  target  nodes  for  those  fields  are  created  (label  1).  A  new  node 
that  is  related  to  an  existing  node  is  merged  into  the  existing  node  (label  2).  New  nodes  that 
are  related  to  each  other  are  merged  (label  4).  New  nodes  that  are  not  even  related  to 
themselves  are  deleted  (label  3).  (The  field  never  refers  to  any  objects.)  Surviving  new 
nodes  are  added  to  the  graph  (label  5)  and  become  the  new  fringe  set. 

11.2.2.8  Reducing  Space  Consumption 

The  above  algorithm  exploits  the  Ajax  interface,  but  peak  memory  usage  can  still  be  very 
large:  accumulating  the  complete  set  of  source  nodes  matching  each  target  node  can  require 
space  quadratic  in  the  number  of  candidate  new  nodes. 

Another  improvement  to  the  algorithm  reduces  peak  space  consumption.  The  basic  idea  is 
to  compute  just  one  or  two  elements  of  the  set  of  source  nodes  reaching  each  target  node. 
This  is  enough  information  to  merge  nodes.  The  query  repeats  several  times,  merging  nodes 
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Initialize  the  graph  G  to  contain  a  single  node,  the  root 
Let  S  (the  fringe  set)  contain  the  root  node 
Let  M  be  a  map  from  G' s  nodes  to  expressions 
Initialize  the  map  M  to  map  the  root  node  to  exp 
While  S  is  nonempty  { 

Let  T  be  the  new  fringe  set,  initially  empty 

Let  T_M  be  an  empty  map  from  T's  nodes  to  expressions 

Let  P  be  an  empty  map  from  nodes  to  sets  of  (node,  field)  pairs 

//  P(n)  records  edges  to  be  created  pointing  into  node  n 

For  each  nonstatic  field  F  in  the  program  { 

For  each  element  S_e  of  S  { 

1 :  Create  a  new  node  N 

Add  N  to  T 

Extend  T_M  with  a  mapping  from  N  to  M(S_e) .F 
Extend  P  with  a  mapping  from  N  to  { (S_e,  F) } 

} 

} 

//  Begin  query  processing 

Run  a  query  with  the  following  parameters: 
sources  =  T_M 
targets  =  M  U  T_M 

R  =  results  =  for  each  target  node,  the  set  of  source  nodes 
whose  expressions  are  related  to  the  target  node's  expression 

//  Any  new  nodes  that  are  related  to  existing  nodes  are 
//  replaced  by  the  existing  nodes 
For  each  node  G_e  in  G  { 

Extend  P  with  a  mapping  from  G_e  to  { } 

For  each  element  T_e  of  R(G_e)  { 

If  T_e  is  still  in  T  then  { 

2:  Extend  P  with  a  mapping  from  G_e  to  P(T_e)  U  P(G_e) 

Delete  the  mapping  for  T_e  from  P 
Delete  T_e  from  T  and  T_M 

} 

} 

} 

Figure  11-10.  More  efficient  heap  graph  construction  algorithm 
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For  each  node  T_e  in  T  { 

//  New  nodes  that  aren't  even  related  to  themselves  are  dead 
If  R(T_e)  is  empty  then  { 

3:  Delete  T_e  from  T  and  T_M 

Delete  the  mapping  for  T_e  from  P 
}  else  { 

For  each  element  T_r  of  R(T_e)  { 

If  T_r  is  still  in  T  and  T_r  is  not  equal  to  T_e  { 

4:  //  Merge  T_r  into  T_e  because  they're  related 

Extend  P  with  a  mapping  from  T_e  to  P(T_e)  U  P(T_r) 

Delete  the  mapping  for  T_r  from  P 
Delete  T_r  from  T  and  T_M 

} 

} 

} 

} 

//  End  query  processing 
Let  S  =  T 

For  each  node  N  in  the  domain  of  P  { 

Extend  M  with  a  mapping  from  N  to  T_M(N) 

For  each  element  (S_e,  F)  of  P(N)  { 

5:  Add  an  edge  to  G  from  S_e  to  N  labelled  F 

} 

} 

} 

Figure  11-10.  More  efficient  heap  graph  construction  algorithm 

after  each  iteration,  until  the  algorithm  converges  to  the  same  state  it  would  have  reached 
in  one  step  of  the  previous  algorithm. 

There  are  two  kinds  of  queries.  Each  query  is  parameterized  by  a  set  of  source  expressions 
and  a  set  of  target  expressions.  For  each  target  expression  ex ,  the  first  kind  of  query 
computes  and  returns  a  source  expression  such  that  ex  <->  e? ,  or  returns  “unknown”  if 
no  such  e?  exists.  The  second  kind  of  query  computes  and  returns  two  distinct  source 
expressions  e0  and  e3  such  that  2,0^  and  ex  <->  e3  (it  may  also  return  just  one 
expression  or  “unknown”  if  two  such  expressions  do  not  exist).  These  queries  are  imple¬ 
mented  in  the  Ajax  framework  similarly  to  the  abstract  set  query  in  Section  4.3.4,  except 
that  when  a  set  overflows  its  bound,  its  current  contents  are  remembered  and  propagated. 
For  example,  for  the  second  kind  of  query,  the  result  of  {  e2  }  merged  with  {  e3 ,  e4  }  could 
be  abstracted  to  “at  least  {  e2 ,  e3  }”. 

Note  that  if  intersection  operations  are  applied  to  this  “bounded  set”  query  data,  we  may 
have  a  result  consisting  of  an  “overflowing”  set  but  with  no  elements  known  to  be  in  the 
set.  (For  example,  consider  the  intersection  of  the  abstract  set  “at  least  {  e  x  }”  with  the 
abstract  set  “at  least  {  }”;  the  result  can  only  be  “at  least  { }”.)  This  information  is  not 

useful  to  the  heap  graph  algorithm.  Therefore  this  implementation  of  the  object  modeling 
tool  does  not  work  with  multiple  intersecting  analyses. 

The  query  processing  of  the  above  algorithm  is  modified  as  shown  in  Figure  11-11.  In 
practice  few  iterations  of  the  inner  loop  are  required. 
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//  Begin  query  processing 

Run  a  query  of  the  first  kind  with  the  following  parameters 
sources  =  T_M 
targets  =  M  U  T_M 

R  =  results  =  for  each  target  node,  0-1  source  nodes 
whose  expressions  are  related  to  the  target  node's  expression 

//  Any  new  nodes  that  are  related  to  existing  nodes  are 
//  replaced  by  the  existing  nodes 
For  each  node  G_e  in  G  { 

Extend  P  with  a  mapping  from  G_e  to  { } 

For  each  element  T_e  of  R(G_e)  { 

If  T_e  is  still  in  T  then  { 

Extend  P  with  a  mapping  from  G_e  to  P(T_e)  U  P(G_e) 
Delete  the  mapping  for  T_e  from  P 
Delete  T_e  from  T  and  T_M 

} 

} 


For  each  node  T_e  in  T  { 

//  New  nodes  that  aren't  even  related  to  themselves  are  dead 
If  R(T_e)  is  empty  then  { 

Delete  T_e  from  T  and  T_M 
Delete  the  mapping  for  T_e  from  P 
}  else  { 

For  each  element  T_r  of  R(T_e)  { 

If  T_r  is  still  in  T  and  T_r  is  not  equal  to  T_e  { 

//  Merge  T_r  into  T_e  because  they're  related 
Extend  P  with  a  mapping  from  T_e  to  P(T_e)  U  P(T_r) 
Delete  the  mapping  for  T_r  from  P 
Delete  T_r  from  T  and  T_M 

} 

} 

} 

} 

Figure  11-11.  Heap  graph  construction  algorithm  with  reduced  peak  space  consumption 
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Repeat  { 

Run  a  query  of  the  second  kind: 
sources  =  T_M 
targets  =  T_M 

R  =  results  =  for  each  target  node,  0-2  source  nodes 
whose  expressions  are  related  to  the  target  node's  expression 

For  each  node  T_e  in  T  { 

For  each  element  T_r  of  R(T_e)  { 

If  T_r  is  still  in  T  and  T_r  is  not  equal  to  T_e  { 

//  Merge  T_r  into  T_e  because  they're  related 
Extend  P  with  a  mapping  from  T_e  to  P(T_e)  U  P(T_r) 

Delete  the  mapping  for  T_r  from  P 
Delete  T_r  from  T  and  T_M 

} 

} 

} 

}  until  R(T_e)  =  {  T_e  }  for  every  T_e  in  T 
//  End  query  processing 

Figure  11-11.  Heap  graph  construction  algorithm  with  reduced  peak  space  consumption 

11.2.3  Lossless  Improvement  to  the  Model 

After  constructing  the  heap  graph  and  elaborating  it  with  class  and  field  information,  the 
object  model  may  contain  superfluous  nodes  that  can  be  eliminated. 

11.2.3.1  Superflous  Leaf  Classes 

Field  edges  can  be  retargeted  from  their  declared  classes  to  some  actual  class  that  is  more 
specific  than  the  declared  class.  In  the  example  of  Figure  11-12,  the  analysis  engine  may 
suggest  that  the  name  field  refers  to  an  abstract  object  which  could  be  an  Integer  or  a 
String,  but  since  the  name  field  is  declared  to  be  a  String  and  no  other  fields  reference 
the  abstract  object,  the  name  field  is  retargeted  to  String.  This  can  leave  nodes  such  as 
Integer  which  are  not  reachable,  i.e.,  no  field  edge  points  to  the  class  or  any  of  its  super¬ 
classes  or  subclasses. 

Such  nodes  can  never  correspond  to  real  objects  in  the  program,  so  they  can  be  deleted.  In 
the  example,  the  Integer  subclass  can  be  removed.  (The  Ob  j  ect  superclass  can  then 
also  be  hidden.)  These  nodes  can  occur  because  of  inaccuracy  in  the  underlying  analysis 
engine. 

11.2.3.2  Merging  Identical  Subgraphs 

Consider  the  example  on  the  left  hand  side  of  Figure  11-13.  Suppose  a  programmer  is  inter¬ 
ested  in  discovering  the  Java  types  of  the  objects  that  may  be  (indirectly)  referenced  by 
Orb,  and  which  field  dereference  paths  are  involved. 

Clearly  it  is  unnecessary  to  distinguish  the  two  Vectors  for  this  task  —  the  fact  that  the 
two  Vector  s  are  not  aliased  is  not  important.  In  this  case,  one  can  save  space  in  the  model 
by  merging  identical  subgraphs.  The  Ajax  object  modeling  tool  provides  this  as  an  option. 
The  above  example  would  be  reduced  as  shown  in  Figure  11-13. 
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Figure  11-12.  Example  of  field  retargeting  leaving  unreachable  nodes 


Figure  11-13.  Example  of  merging  duplicate  subgraphs 


11.2.4  User  Interface 

The  Ajax  object  modeling  tool  has  a  simple  user  interface.  The  user  specifies  the  program 
to  be  analyzed  by  giving  the  “class  path”  and  the  name  of  the  “main”  class.  By  default,  the 
tool  uses  as  root  expressions  all  the  local  variables  at  the  last  instruction  in  the  main  class 
reachable  by  non-exceptional  control  flow.  The  user  can  specify  an  explicit  root  expression 
instead,  if  desired.  The  tool  computes  the  model  and  outputs  the  results  in  a  format  suitable 
for  processing  by  AT&T’s  dot  tool  for  graph  layout  [36]. 

11.3  Examples 

11.3.1  JavaP  Example 

Figure  11-14  shows  the  object  model  produced  by  Ajax  applied  to  Sun’s  JavaP  disas¬ 
sembler  tool.  Isomorphic  subgraphs  have  not  been  merged.  This  example  clearly  shows  the 
strengths  and  limitations  of  the  Ajax  object  modeling  tool. 

This  model  uses  the  default  set  of  root  expressions  —  all  the  local  variables  at  the  last 
instruction  in  JavaP  .  main  reachable  by  non-exceptional  control  flow.  The  tool  uses  the 
SEMI  analysis. 
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The  figure  shows  multiple  occurrences  of  the  Hashtable  class.  Each  Hashtable  has 
an  array  of  HashtableEntries,  and  each  HashtableEntry  has  a  key  and  value.  In 
Java,  the  keys  and  values  are  declared  as  Ob  j  ects,  but  in  most  cases  Ajax  has  been  able 
to  resolve  them  to  specific  classes,  revealing  the  actual  keys  and  values  of  each  Hashtable. 
Forexample,  we  can  see  that  Local-Environment .  packages  is  aHashtable  mapping 
Identifiers  to  Packages  (in  the  dashed  outline). 

On  the  left  hand  side  of  the  model  are  a  number  of  occurrences  of  stream-related  classes. 
This  part  of  the  model  reveals,  for  example,  that  the  JavaP  object’s  output  field  is  a 
PrintWriter  wrapping  an  OutputStreamWriter  wrapping  a  PrintStream 
wrapping  a  Buf  f  eredOutputStream  wrapping  a  FileOutputStream  (as 
indicated  by  the  fat  dashed  arrows).  Each  of  these  Writer  or  Stream  objects  contains  an  out 
field  referencing  the  Writer  or  Stream  it  wraps.  None  of  these  relationships  are  apparent 
from  the  Java  class  declarations  alone,  because  the  out  fields  are  simply  declared  as 
Writer  or  OutputStream. 

On  the  right  hand  side  of  the  model  is  an  Ob  j  ect  node  with  many  edges  leading  into  it, 
e.g.,  from  the  key  and  value  fields  of  several  Hashtables.  Here  the  analysis  was  not 
powerful  enough  to  distinguish  the  objects  referenced  by  the  incoming  fields  or  to  precisely 
determine  their  classes.  The  model  reveals  only  that  the  referenced  objects  are  either 
Strings,  Numbers,  FieldDef  initions,  ClassDeclarations,  or  subclasses 
of  one  of  those  classes.  This  is  a  problem  that  becomes  increasingly  severe  as  the  analyzed 
programs  grow:  imprecision  in  the  analysis  leads  to  a  few  nodes  covering  a  very  large 
number  of  different  kinds  of  run-time  objects.  Field  edges  that  lead  to  such  nodes  do  not 
convey  much  useful  information. 

A  fundamental  problem  revealed  by  this  example  is  that  this  graph  is  about  as  large  as  one 
can  usefully  lay  out  and  read.  It  has  96  nodes  and  157  edges,  and  JavaP  is  a  relatively  small 
Java  program.  As  graphs  get  larger,  it  becomes  rapidly  more  difficult  to  visualize  them  in 
a  reasonable  way. 

11.3.2  CTAS  Example 

Figure  1 1-15  shows  the  object  model  produced  by  Ajax  applied  to  the  CTAS  example.  The 
setup  is  the  same  as  for  the  previous  example.  This  graph  has  122  nodes  and  166  edges. 

This  model  reveals  some  interesting  facts,  e.g.,  that  the  postRecvHandlers, 
sendHandlers  and  mainRecvHandlers  of  HandlerManager  are  all  empty. 
(They  are  used  by  other  applications  based  on  this  code,  but  not  by  the  test  program  under 
analysis.)  The  modelreveals  that  ConnectionManager  .  socketQueue  is  a  Vector 
of  Sockets,  and  is  able  to  distinguish  many  different  uses  of  CTAS’s  HandlerTable 
class. 

On  the  negative  side,  again  there  is  an  Ob  j  ect  node  covering  a  large  number  of  different 
kinds  of  objects,  that  seem  to  be  unrelated  but  which  are  not  being  distinguished  by  the 
analysis. 
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11.3.3  Improving  The  Model  By  Discarding  Information 

11.3.3.1  Removing  “Lumps” 

Ajax  object  models  for  large  programs  are  often  crippled  by  the  “large  lump”  problem, 
where  the  analysis  creates  one  or  more  Object  nodes  covering  a  large  number  of  different 
kinds  of  objects  that  are  not  truly  related.  These  “lumps”  cause  the  model’s  graph  to  be 
overconstrained,  making  it  difficult  to  lay  out  and  obscuring  useful  information. 

One  way  to  extract  some  useful  information  from  these  models  is  to  detect  and  remove 
inaccurate  “lumps”  from  the  model  graph.  A  useful  heuristic  is  to  remove  nodes  corre¬ 
sponding  to  abstract  objects  whose  most  specific  known  superclass  is  Ob  j  ect  and  which 
have  many  incoming  edges.  The  field  edges  leading  to  such  nodes  are  annotated  to  indicate 
that  the  referent  of  the  field  is  not  known.  Nodes  with  many  incoming  edges  especially 
impede  comprehensible  graph  layout  using  hierarchy-based  layout  tools  such  as  do  t,  so  it 
is  especially  advantageous  to  remove  them. 

This  approach  sacrifices  some  information  in  the  hope  that  some  of  the  remaining  infor¬ 
mation  may  still  be  useful  to  the  user.  A  model  that  presents  some  information  in  a  usable 
form  is  more  useful  than  an  incomprehensibly  large  model. 

11.3.3.2  Hiding  Strings  And  Other  Classes 

As  described  in  Section  8.4.3,  most  references  to  String  objects  are  aliased  because  they 
may  refer  to  String  objects  extracted  from  the  “constant  pool”.  Thus,  in  an  object  model, 
most  fields  of  type  String  lead  to  a  common  node.  This  clutters  the  graph  layout  with  a 
large  number  of  long  edges.  Furthermore,  few  programmers  are  interested  in  disambigu¬ 
ating  String  references  even  when  this  is  possible.  Therefore  the  Ajax  object  modeling 
tool  can  optionally  remove  the  common  String  node  and  annotate  relevant  field  edges  to 
indicate  that  the  referent  is  some  unknown  String. 

The  same  technique  can  also  be  useful  for  other  classes.  The  Ajax  object  modeling  tool 
allows  the  user  to  explicitly  specify  an  arbitrary  set  of  classes  to  be  elided;  optionally,  all 
subclasses  of  a  specified  class  can  be  elided. 

11.3.4  Jess  Example 

Figure  11-16  illustrates  these  techniques  applied  to  an  object  model  for  the  Java  Expert 
System  Shell  example.  To  produce  a  model  of  manageable  size,  the  details  of  the  stream- 
related  classes  are  elided  by  the  tool  using  the  techniques  described  in  Section  1 1 .3.3.2.  The 
rules  for  elision  are  specified  manually.  In  this  case  the  rules  are: 
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•  Elide  all  lumps  with  more  than  seven  incoming  edges. 

•  Elide  all  Strings. 

•  Elide  all  subclasses  of  InputStream. 

•  Elide  all  subclasses  of  OutputStream. 

As  in  the  previous  examples,  this  example  reveals  the  contents  of  many  of  the  container 
objects.  It  also  reveals  some  information  that  may  be  surprising;  for  example,  the  Rete’s 
m_clearables  Vector  is  always  empty.  Also,  there  are  (at  least)  two  distinct  instances 
of  the  Jesp  engine  object. 

This  graph  contains  189  nodes  and  243  edges.  The  corresponding  complete  graph  (without 
any  node  elision)  contains  885  nodes  and  1173  edges.  The  complete  graph  is  much  too 
complex  to  be  automatically  laid  out  in  a  comprehensible  way.  Therefore,  although  this 
reduced  graph  contains  less  information,  in  practice  it  is  much  more  useful  because  its 
information  is  much  more  accessible. 

This  example  shows  one  remaining  problem  with  Ajax  object  models:  it  reveals 
unimportant  implementation  details  of  library  classes.  For  example,  the  details  of  the 
implementation  ofHashtable  are  revealed,  when  it  would  be  better  to  simply  show  that 
Hashtables  contain  keys  and  values. 

11.4  Conclusions 

11.4.1  Contributions 

Using  the  Ajax  VPR,  it  is  possible  to  construct  heap  graphs  and  object  models.  However, 
inaccuracies  in  the  analysis  and  the  sheer  size  of  the  graphs  produced  can  cripple  the 
usefulness  of  these  graphs.  Simple  pruning  countermeasures  result  in  graphs  that  contain 
accessible,  useful  and  surprising  information,  even  for  large  programs.  This  information 
cannot  be  easily  automatically  obtained  using  other  techniques,  especially  those  that  rely 
on  declared  class  information. 

The  Ajax  VPR  is  not  the  ideal  abstraction  to  use  for  computing  heap  graphs.  Extensive 
postprocessing  is  required.  A  tool  with  direct  access  to  SEMI’s  constraint  structures  would 
be  more  efficient.  Given  the  Ajax  infrastructure,  however,  it  seemed  to  be  less  work  to 
compute  the  heap  graphs  from  the  VPR  than  to  bypass  the  VPR  and  hook  into  the  SEMI 
implementation. 

11.4.2  Future  Work 

One  major  remaining  problem  with  these  models  is  that  they  have  no  notion  of  scope.  In 
particular,  they  expose  the  implementation  of  library  data  structures.  Instead  it  would  be 
preferable  to  only  show  classes  and  fields  visible  to  the  user.  On  the  other  hand,  sometimes 
information  about  private  fields  is  useful  to  the  user  —  for  example,  the  key  and  value 
fields  of  HashtableEntry  convey  very  useful  information.  Heuristics  or  other 
techniques  to  resolve  this  problem  are  an  interesting  area  for  future  inquiry. 
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12  A  Scanning  Tool 


12.1  Introduction 

Programmers  are  adept  at  using  simple  tools  such  as  “grep”  to  scan  programs.  More 
advanced  cross-referencing  and  scanning  tools  such  as  class  browsers,  indexed  full-text 
search  engines,  and  hyperlinked  source  browsers  such  as  LXR  [91],  are  also  very  popular. 
However,  none  of  these  tools  are  semantics-based;  they  use  syntactic  or  lexical  infor¬ 
mation. 

Using  the  Ajax  analysis  toolkit,  it  is  not  difficult  to  build  similar  tools  that  utilize  semantic 
information  about  the  program.  To  demonstrate  this,  I  built  a  simple  example  called 
“JGrep”,  and  used  it  to  reverse  engineer  some  of  the  example  programs. 

12.2  The  JGrep  Tool 

12.2.1  User  Interface 

JGrep  has  a  simple  “command  line”  interface,  although  it  would  be  trivial  to  incorporate  it 
into  a  graphical  or  Web-based  interface  such  as  LXR.  The  user  specifies  the  program  to 
analyze,  and  a  program  expression  (including  a  code  location).  The  expression  need  not 
actually  occur  in  the  program  text.  JGrep  reports  information  about  all  the  objects  which 
might  be  returned  as  the  result  of  the  expression  at  the  given  location. 

Four  kinds  of  information  are  returned: 

•  New  sites:  all  program  locations  where  the  objects  are  created. 

•  Call  sites:  all  program  locations  where  one  of  the  objects  is  passed  as  the  “this”  param¬ 
eter  to  a  method  call. 

•  Read  sites:  all  program  locations  where  a  field  of  one  of  the  objects  is  read. 

•  Write  sites:  all  program  locations  where  a  field  of  one  of  the  objects  is  written. 

Since  Ajax  performs  conservative  analysis,  some  spurious  sites  may  be  returned  along  with 
the  true  sites. 

The  user  can  control  which  kinds  of  sites  are  returned,  using  command  line  options. 

12.2.2  Implementation 

JGrep  is  easy  to  implement  using  the  Ajax  toolkit.  It  comprises  462  lines  of  code. 
Collecting  the  sets  of  sites  is  a  simple  application  of  the  value-point  relation.  The  source  set 
S  is  a  singleton  set  containing  the  user-specified  expression,  and  the  target  set  T  contains 
expressions  for  all  the  sites  the  user  is  interested  in: 
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•  New:  The  results  of  all  “new”  instructions,  i.e.,  the  top  of  the  operand  stack  at  the 
instruction  after  each  new,  newarray,  anewarray  and  multinewarray  instruc¬ 
tion. 

•  Call:  The  stack  element  representing  the  “this”  operand  at  every  invokevirtual, 
invokespecial  and  invokeinterf ace  instruction. 

•  Read:  The  top  of  the  operand  stack  at  each  get  field  instruction. 

•  Write:  The  top  of  the  operand  stack  at  each  pu t  f  i  e  1  d  instruction. 

The  “intermediate  data”  propagated  by  the  analysis  are  boolean  values,  initially  set  to  false 
and  then  set  to  true  for  the  solitary  source  expression  and  all  expressions  reachable  from  it 
in  the  analyzer’ s  graph.  For  each  target  expression  receiving  the  value  “true”,  the  tool  prints 
out  the  code  location  associated  with  that  expression  —  i.e.,  the  location  of  the  “new” 
instruction,  the  “call”  instruction,  the  get  field  instruction  or  the  put  field 
instruction. 

JGrep  currently  accepts  and  prints  code  locations  as  the  fully  qualified  name  of  a  method 
and  a  bytecode  offset  within  that  method,  e.g.,  “jess.  Main .  main#3  7  3  :  local-  9”  — 
local  variable  9in  class  j  ess  .Main,  method  main,  bytecode  offset  373.  It  would  be  easy 
—  and  highly  desirable  —  to  input  and  output  source  line  numbers  and  source-level  expres¬ 
sions  instead. 

JGrep  currently  reanalyzes  the  program  for  every  query,  which  means  that  there  is  a  large 
delay  between  posing  a  query  and  receiving  an  answer.  However,  it  would  be  easy  to  have 
JGrep  run  the  analysis  engine  once  and  then  answer  a  succession  of  queries. 

12.3  Examples 

12.3.1  Checking  an  Anomaly 

The  object  model  for  Jess  presented  in  Section  1 1.3.4  shows  that  the  Rete’s 
m_clearables  Vector  is  always  empty.  To  investigate  further,  one  simply  submits  to 
JGrep  an  expression  corresponding  to  a  path  to  the  desired  node  in  the  object  model: 

jess .Main .main#373 :local-9,jess.Jesp .m_engine, 
j  ess . Rete .m_clearables 

This  expression  specifes  local  variable  9  at  offset  373  in  the  method  main  in  class 
j  ess  .Main,  a  reference  to  the  Jesp  application  object,  followed  by  two  field  derefer¬ 
ences:  first,  the  dereference  of  field  m_engine  declared  in  class  jess.  Jesp,  to  get  the 
Rete  engine,  and  then  the  dereference  of  field  m_clearables  in  class  jess.  Rete. 

The  “New”  and  “Call”  sites  output  are  shown  in  Figure  12-1. 

The  single  “NEW”  site  reveals  immediately  that  the  Vector  is  created  in  Rete’s 
constructor  (jess.  Rete  .  <init>).  The  call  to  j  ava  .  util  .Vector  .  elements 
shows  that  the  Vector’s  elements  are  scanned  in  the  method  Rete  .  clear  ( )  .  The  call 
to  j  ava  .  util  .Vector  .  removeAllElements  indicates  that  it  is  emptied  in 
Rete.clear().  There  are  no  calls  to  methods  that  add  elements  to  the  Vector. 
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CALL  to  method  void  j ava . lang . Ob j ect . <init> ( ) 

Offset  1  in  method  void  j ava . util . Vector . <init> ( int,  int) 

CALL  to  method  void  j ava . util . Vector . <init> ( int ,  int) 

Offset  3  in  method  void  j ava . util . Vector . <init> ( int ) 

CALL  to  method  void  j ava . util . Vector . <init> ( int ) 

Offset  3  in  method  void  j ava . util . Vector . <init> ( ) 

NEW  of  class  j ava . util . Vector : 

Offset  182  in  method  void  j ess . Rete . <init> ( j ess . ReteDisplay ) 

CALL  to  method  void  j ava . util . Vector . <init> ( ) 

Offset  186  in  method  void  j ess . Rete . <init> ( j ess . ReteDisplay ) 

CALL  to  method  j ava . util . Enumeration  j ava . util . Vector . elements ( ) 

Offset  67  in  method  void  j ess . Rete . clear ( ) 

CALL  to  method  void  j  ava  .  util .  Vector  .  removeAHElements  ( ) 

Offset  249  in  method  void  j ess . Rete . clear ( ) 

Figure  12-1.  Output  of  the  creation  sites  and  method  calls  on  the  m_clearables  object 

This  information  is  helpful  because  it  indicates  to  the  programmer  that  if  there  were  any 
elements  in  the  Vector,  they  could  only  be  used  in  the  method  Rete  .  clear.  Therefore 
further  investigation  of  this  anomaly  should  focus  on  that  method.  If  such  investigation 
proves  that  an  empty  m_clearables  is  benign,  then  the  entire  field  can  be  removed  and 
we  can  be  sure  that  no  other  code  will  be  affected. 

This  example  illustrates  the  power  of  the  SEMI  analysis;  a  simpler  analysis  such  as  RTA 
would  not  have  been  able  to  distinguish  the  different  Vectors  used  in  the  program. 
Running  “grep”  over  the  Jess  sources  finds  43  occurrences  of  the  name  Vector,  5  occur¬ 
rences  of  the  name  removeAHElements,  27  occurrences  of  the  name  elements,  34 
occurrences  of  the  name  elementAt,  and  22  occurrences  of  the  name  addElement.  It 
would  require  significant  effort  to  sort  through  these  occurrences  to  find  the  three  sites 
specifically  operating  on  the  m_clearables  Vector. 

12.3.2  Checking  Field  Accesses 

In  JavaC,  there  is  a  class  BatchEnvironment  with  a  public  flags  field.  It  is  natural 
to  wonder  whether  and  how  this  field  is  accessed  —  is  there  an  abstraction  violation 
occurring,  and  in  what  form?  JGrep  provides  the  answer,  using  a  query  for  the  read  and 
write  accesses  to  the  objects  denoted  by  the  expression: 

sun . tools . j  avac . BatchEnvironment . <init> 

( j  ava . io . Output St ream,  sun. tools. j  ava .ClassPath, 
sun. tools. j  avac . ErrorConsumer ) #0 
: local-0 

This  expression  denotes  the  “this”  objects  of  the  most  general  constructor  for 
BatchEnvironment.  The  results  for  the  flags  field  are  shown  in  Figure  12-2. 

All  the  accesses  are  from  one  of  three  methods: 

sun .  tools  .  j  avac  .Main .  compile  (read  and  written) 

sun .  tools  .  j  avac  .  BatchEnvironment .  getFlags  (read  only) 

sun .  tools  .  j  avac  .  BatchEnvironment .  reportError  (read  and  written) 
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READ  from  field  "flags"  of  class  sun . tools . j avac . BatchEnvironment : 

Offset  742  in  method  boolean 
sun . tools . j avac . Main . compile ( j  ava . lang . String [ ]  ) 

WRITE  to  field  "flags"  of  class  sun . tools . j avac . BatchEnvironment : 

Offset  714  in  method  boolean 
sun . tools . j avac . Main . compile ( j  ava . lang . String [ ]  ) 

WRITE  to  field  "flags"  of  class  sun . tools . j avac . BatchEnvironment : 

Offset  749  in  method  boolean 
sun . tools . j avac . Main . compile ( j  ava . lang . String [ ]  ) 

READ  from  field  "flags"  of  class  sun . tools . j avac . BatchEnvironment : 

Offset  708  in  method  boolean 
sun . tools . j avac . Main . compile ( j  ava . lang . String [ ]  ) 

READ  from  field  "flags"  of  class  sun . tools . j avac . BatchEnvironment : 

Offset  1  in  method  int  sun . tools . j avac . BatchEnvironment . getFlags ( ) 

READ  from  field  "flags"  of  class  sun . tools . j avac . BatchEnvironment : 
Offset  216  in  method  void 

sun . tools . j  avac . BatchEnvironment . reportError ( j  ava . lang . Ob j  ect,  int, 
j  ava . lang . String,  j  ava . lang . String) 

WRITE  to  field  "flags"  of  class  sun . tools . j avac . BatchEnvironment : 

Offset  222  in  method  void 

sun . tools . j  avac . BatchEnvironment . reportError ( j  ava . lang . Ob j  ect,  int, 
j  ava . lang . String,  j  ava . lang . String) 

WRITE  to  field  "flags"  of  class  sun . tools . j avac . BatchEnvironment : 

Offset  92  in  method  void 

sun . tools . j  avac . BatchEnvironment . reportError ( j  ava . lang . Ob j  ect,  int, 
j  ava . lang . String,  j  ava . lang . String) 

READ  from  field  "flags"  of  class  sun . tools . j avac . BatchEnvironment : 
Offset  86  in  method  void 

sun . tools . j  avac . BatchEnvironment . reportError ( j  ava . lang . Ob j  ect,  int, 
j  ava . lang . String,  j  ava . lang . String) 

Figure  12-2.  Accesses  to  the  flags  field  of  BatchEnvironment 

Note  that  this  example  does  not  particularly  benefit  from  SEMI.  The  same  results  are 
obtained  using  Ajax’s  RTA  engine,  because  there  is  really  only  one  instance  of 
BatchEnvironment  used  in  the  program. 

12.4  Conclusions 

Using  the  alias  information  obtained  by  Ajax,  it  is  easy  to  write  simple  and  useful  search 
tools.  These  tools  improve  on  the  functionality  available  from  lexical  and  syntactic  tools  in 
a  natural  way.  Additional  postprocessing  could  improve  the  utility  of  the  results,  but  even 
the  simplest  approaches  are  useful.  There  is  significant  scope  for  new  searching  and  visual¬ 
ization  tools  based  on  these  techniques. 
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13  Conclusions 


13.1  Summary 

Ajax  demonstrates  that  sound,  static,  global  alias  analysis  can  be  used  as  the  basis  for  a 
variety  of  software  engineering  tools.  These  tools  produce  interesting  and  nontrivial  results 
that  cannot  be  obtained  by  other  existing  methods. 

The  Ajax  design  shows  that  it  is  practical  to  separate  analysis  implementations  from  tools 
that  consume  alias  information.  The  specification  for  an  analysis  engine  is  semantically 
simple,  as  defined  by  the  value-point  relation,  but  powerful  enough  to  enable  cheap 
construction  of  a  wide  range  of  tools.  The  interface  is  also  efficient;  for  most  configura¬ 
tions,  the  scalability  of  the  system  is  constrained  by  the  scalability  of  the  underlying 
analysis  and  not  by  the  overhead  of  the  VPR  interface.  The  exception  is  the  object 
modelling  tool.  It  takes  a  significant  amount  of  code  and  execution  resources  to  reconstruct 
a  “heap  graph”  from  the  VPR,  and  also  requires  a  strengthened  definition  of  the  VPR. 

Ajax  also  shows  that  it  is  possible  to  implement  the  VPR  interface  using  very  different 
analyses  —  RTA,  based  on  declared  language  types,  SEMI,  based  on  polymorphic  type 
inference,  and  a  hybrid  analysis  based  on  the  “intersection”  of  these  two  analysis  engines. 
The  strong  separation  between  analyses  and  tools  ensures  that  all  tools  work  correctly 
regardless  of  the  analysis  configuration.  The  analysis  technique  can  be  selected  at  run  time 
according  to  the  desired  accuracy  for  the  task  at  hand  and  the  execution  resources  available. 
For  example,  for  finding  the  set  of  possibly  live  methods,  RTA  is  usually  good  enough,  but 
SEMI  is  much  better  for  resolving  virtual  method  calls,  albeit  more  expensive. 

The  VPR  interface  also  enables  easy  composition  of  analyses.  It  is  trivial  to  build  an 
analysis  that  computes  the  intersection  of  the  results  of  two  or  more  other  analyses.  Ajax 
can  also  provide  “sequential  composition”;  for  example,  SEMI  can  use  some  other  arbitrary 
analysis  to  compute  the  call  graph  it  uses  to  reduce  programs  to  first  order. 

SEMI  shows  that  type  inference  with  polymorphic  recursion  can  usefully  be  applied  to 
large  Java  programs,  especially  if  the  program  is  conservatively  reduced  to  first-order  code 
before  the  application  of  SEMI.  I  have  proven  SEMI  sound  with  respect  to  a  simplified  — 
but  still  very  rich  —  model  of  the  Java  bytecode,  and  shown  that  SEMI  can  even  analyze 
programs  which  do  not  conform  to  the  static  safety  checks  usually  performed  by  Java. 
SEMI  provides  a  significant  improvement  in  accuracy  over  a  wide  range  of  tools  and 
example  programs,  and  well  captures  implicit  type  parametricity  in  Java  programs,  proving 
a  large  percentage  of  downcasts  safe  in  most  programs.  However,  SEMI  is  less  accurate  in 
larger  programs,  because  imprecision  in  analyzing  one  part  of  the  code  spills  over  into  other 
parts  of  the  code.  Although  SEMI  can  indeed  analyze  some  large  programs  (Ladybug 
having  over  5,000  methods),  its  scalability  in  terms  of  resource  consumption  and  accuracy 
still  leaves  much  to  be  desired. 
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Polymorphic  recursion  plays  an  interesting  role  in  SEMI.  I  have  described  several 
techniques  required  to  make  the  SEMI  implementation  of  polymorphic  recursion  practical. 
The  benefits  of  polymorphic  recursion  vary  by  tool:  in  the  virtual  call  resolution  tool, 
polymorphic  recursion  improves  accuracy  only  a  little,  but  for  checking  downcasts, 
unrestricted  polymorphic  recursion  improves  accuracy  a  great  deal  —  but  only  when  the 
program  is  initially  reduced  to  first  order.  The  generality  of  the  SEMI  constraint  solving 
engine  seems  to  limit  its  performance  compared  to  other  systems  based  on  Hindley-Milner 
type  inference  [54]  [69]. 

My  work  shows  that  composing  RTA  and  SEMI  by  intersection  is  very  useful.  RTA  is  so 
cheap  that  performance  is  not  noticeably  affected,  and  for  many  tools  the  combined 
analyses  are  significantly  more  accurate  than  either  analysis  alone. 

Most  of  the  Ajax  tools  were  easy  and  cheap  to  build.  Of  all  the  tools,  I  personally  feel  that 
the  most  immediately  useful  is  “JGrep”,  having  used  it  myself  to  reverse-engineer  some  of 
the  example  programs  for  which  source  code  is  not  available.  It  is  very  useful  to  be  able  to 
track  down  all  accesses  to  one  instance  of  a  commonly  reused  class.  The  object  modelling 
tool  demonstrates  that  starting  with  alias  information  and  transforming  it  into  an  object 
model  can  produce  more  precise  models  than  existing  techniques,  which  start  with  a  class 
hierarchy  model  and  improve  its  precision  using  heuristics  or  other  analysis  [46]. 

Accounting  for  the  behavior  of  non-Java  code  —  i.e.,  native  code  and  reflection  —  required 
a  great  deal  of  work.  This  is  an  important  problem  because  real  programs  (especially  the 
standard  Java  libraries)  use  these  features  often,  and  in  a  variety  of  ways.  Ajax  provides 
thorough  handling  of  non-Java  code  by  accepting  specifications  describing  how  non-Java 
code  is  used  by  the  application.  However,  unavailability  of  the  whole  program  remains  a 
fundamental  problem. 

13.2  Outlook 

There  are  many  possible  future  directions  for  this  work: 

•  SEMI  is  too  slow  at  analyzing  very  large  programs.  It  may  be  possible  to  reimplement  a 
similar  analysis  to  achieve  much  higher  performance,  perhaps  using  a  design  similar  to 
Ruf’s  escape  analysis  for  Java  [69].  Alternatively,  it  may  be  possible  to  design  a  sim¬ 
pler  analysis  with  some  of  the  desirable  features  of  SEMI. 

•  SEMI’s  accuracy  degrades  as  program  size  increases.  Addressing  this  may  required 
improved  analysis  techniques.  Some  limited  flow-sensitive  analysis  might  improve 
accuracy,  as  might  tighter  integration  of  language  type  information  into  SEMI’s  com¬ 
putations.  One  improvement  that  would  be  almost  certain  to  provide  increased  accu¬ 
racy  would  be  the  introduction  and  use  of  “parity  annotations”  on  instance  constraints, 
as  described  by  Fahndrich,  Rehof  and  Das  [31]. 

•  It  would  be  very  interesting  to  implement  more  analyses  in  the  Ajax  framework.  Ajax 
provides  a  great  deal  of  infrastructure  to  make  it  easier  to  implement  analyses.  Ajax 
also  provides  a  tool  suite;  once  an  analysis  has  been  implemented,  it  can  be  immedi¬ 
ately  applied  to  a  wide  range  of  problems.  Analysis  composition  is  also  very  easy  in 
Ajax,  and  can  compensate  for  weaknesses  in  one  particular  analysis  technique.  Also, 
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because  Ajax  provides  a  single  description  of  the  behavior  of  non-Java  code  and  a  fixed 
specification  of  sound  analysis  results,  it  is  both  easy  and  fair  to  compare  the  accuracy 
and  performance  of  different  analyses  implemented  in  Ajax. 

•  The  VPR  is  not  the  ultimate  abstraction  of  program  behavior.  It  has  very  limited 
expression  of  context:  for  example,  it  is  impossible  to  ask  whether  two  expressions  in  a 
method  get  the  same  value  during  the  same  invocation  of  the  method.  It  is  also  impossi¬ 
ble  to  specify  that  an  expression  should  apply  not  just  at  a  particular  program  point,  but 
also  when  its  method  has  a  particular  caller.  SEMI  can  capture  some  of  this  informa¬ 
tion.  The  VPR  could  be  extended  to  allow  this  information  to  be  communicated  to 
tools. 

•  The  VPR  could  also  be  extended  to  accomodate  different  behaviors  of  tags  in  the 
tagged  bytecode  semantics.  For  example,  one  might  wish  to  have  addition  take  two 
operands  with  the  same  tag  and  return  a  result  with  the  same  tag  as  the  operands.  Thus 
an  expression  referring  to  the  result  of  an  addition  would  match  an  expression  referring 
to  one  of  the  operands.  This  would  allow  Ajax  to  address  additional  tasks. 

•  More  tools  could  easily  be  built  in  the  Ajax  framework.  Accessible  alias  analysis  opens 
up  many  possibilities  for  new  tools  for  various  programmer  tasks. 

•  Sound,  global,  static  analysis  of  Java  programs  is  inherently  difficult  because  Java  pro¬ 
grams  use  Java  features  that  are  not  amenable  to  static  analysis,  such  as  reflection.  Fur¬ 
thermore,  modern  software  environments  consist  of  dynamically  configured 
components,  often  interacting  over  channels  not  amenable  to  static  analysis,  e.g.,  by 
exchanging  XMF  data.  Thus  many  applications  are  not  amenable  to  sound  global  static 
analysis. 

•  It  may  be  necessary  to  perform  local  static  analysis.  In  particular,  it  would  be  inter¬ 
esting  to  make  “worst  case”  assumptions  about  missing  code  and  then  measure  the 
accuracy  of  the  resulting  analyses.  It  would  also  be  interesting  to  introduce  “reason¬ 
able”  heuristics  to  approximate  the  behavior  of  missing  code  and  then  measure  anal¬ 
ysis  accuracy. 

•  It  is  easy  to  change  the  definition  of  the  VPR  to  quantify  over  some  fixed  finite  set  of 
program  traces  (e.g.,  some  program  traces  that  have  actually  been  obtained  by  run¬ 
ning  the  program)  instead  of  all  traces.  An  Ajax  analysis  could  compute  a  precise 
VPR  for  a  program  by  running  it  on  test  data  and  recording  the  execution.  The  exist¬ 
ing  Ajax  tools  would  be  immediately  usable  with  this  dynamic  analysis. 

I  predict  that  in  the  forseeable  future,  tasks  such  as  program  understanding,  which  do  not 
absolutely  require  sound  static  analysis  of  code,  will  best  be  addressed  by  other  means,  such 
as  dynamic  analysis  or  unsound  static  analysis.  Tasks  which  do  require  sound  static 
analysis,  such  as  compilers  or  verification  tools,  will  need  to  perform  local  analysis  of 
individual  components,  relying  on  whatever  explicit  (run  time  checkable)  annotations  exist 
at  component  boundaries  to  specify  the  behavior  of  “external”  code. 
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Appendix  A:  Polymorphic  Recursion,  Unrestricted 
Recursive  Types  and  Principal  Types 

Consider  a  standard  lambda  language  with  a  type  system  having  polymorphic  recursion  and 
unrestricted  (p)  recursive  types.  I  prove  that  there  exist  typable  program  terms  that  have  no 
principal  type. 

A.l  Intuition 

In  the  setting  of  p-recursive  types,  a  type  T  for  a  term  f  is  principal  iff  T  is  a  type  of  f  and 
every  type  of  f  is  equivalent  to  an  instance  of  T,  where  type  equivalence  means  that  the 
(possibly  infinite)  regular  labelled  trees  corresponding  to  the  types  are  identical. 

Consider  the  following  function,  written  in  ML-like  syntax: 

fun  f  (a,  b)  =  f  b 

This  function  is  typable  using  polymorphic  recursion  and  unrestricted  recursive  types,  but 
there  is  no  principal  type.  A  list  of  valid  types  is  below.  All  free  variables  are  assumed  to 
be  universally  quantified. 

(pt.  v  x  t)  — >  a 

W  X  (p/.  V  X  t)  — >  II 
X  X  (w  X  (pt.  V  X  t ))  — »  II 

Informally  we  could  write  these  types  as  “(v,  (v,  (v,  ...)))—» ?/”,  “(w\  (v,  (v,  (v,  . . . ))))  — »  ?/”, 
and  “(x,  (w\  (v,  (v,  (v,  . . . )))))  — >  iT .  This  leads  to  the  intuition  that  the  principal  type  would 
need  to  have  an  unbounded  number  of  quantified  variables  —  but  such  types  do  not  exist. 

A.2  Proof 

More  formally,  suppose  T  is  the  principal  type  of  the  function  f  given  above.  We  show  that 
this  leads  to  a  contradiction. 

Let  m  be  the  number  of  free  variables  in  T.  Define 

J0  =  p  t.  v  x  t 

J n  =  wn  x  J»-l  (»  >  °) 

For  all  //,  Jw  — >  u  is  a  type  of  f .  This  is  easily  shown  by  induction  on  n. 

Therefore  there  is  a  substitution  S  such  that  S(T)  is  equivalent  to  — »  u.  u  has  more 

free  variables  than  T;  therefore,  there  is  a  free  variable  of  T  (referred  to  as  e)  such  that  S 
maps  e  to  a  term  equivalent  to  a  subterm  of  Jm  —>  u  containing  at  least  two  free  variables.  I 
will  refer  to  the  latter  subterm  as  the  “expansion  term”.  These  are  the  subterms  of  — »  u, 

modulo  equivalence: 

1 .  J  m  — y  a 

2.  a 

3.  J,  (1  <  i  <  m) 
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4.  Wj  (1  <  i  <  m) 

5.  v 

6.  \it.  v  x  t 

Cases  2,  4,  5  and  6  do  not  contain  at  least  two  free  variables,  hence  cannot  be  the  expansion 
term.  Case  1  cannot  be  the  expansion  term,  for  then  T  =  e,  a  single  free  variable,  which  is 
not  a  type  of  f .  Therefore  the  expansion  term  is  J,  (for  some  /,  1  <  i  <  m). 

Let  S'  be  the  same  substitution  as  S  except  that  e  is  mapped  to  “int”.  S'(T)  is  equivalent  to 
the  tree  for  J7H  — »  u  with  one  or  more  subtrees  equivalent  to  J,  replaced  by  “int”.  But  since 
Wj  occurs  just  once  in  the  tree  for  “J„,  — »  if\  there  is  only  one  such  subtree  —  the  actual 
occurrrence  of  J,  introduced  by  the  production  rules.  Therefore  S'(T)  =  Km  —>  u  where 

K-  =  int 

Kn  =  w„  x  K„_  i  (//  >  0) 

It  is  easy  to  see  that  this  is  not  a  type  of  f ,  violating  the  assumption  that  T  is  a  principal  type. 

A.3  Comments 

The  principal  type  T  of  a  term  in  Henglein’s  type  system  is  also  a  valid  type  of  the  term 
when  the  type  system  has  recursive  types.  The  reason  that  principal  typing  fails  is  because 
the  addition  of  recursive  types  may  allow  new  types  for  the  term  which  are  not  instances  of 
T. 
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Appendix  B:  Ajax  Foreign  Code  Specifications 

I  provide  the  complete  text  of  the  foreign  code  specifications  used  by  Ajax.  They  cover  a 
large  part  of  the  JDK  1.1  class  library  for  Windows,  but  not  all  of  the  library.  I  provide  the 
specifications  to  indicate  how  extensive  they  are  and  how  much  modelling  is  required. 
Also,  the  curious  reader  can  see  how  I  modelled  the  behavior  of  specific  functions. 


/*  Special  definitions  used  by  the  SEMI  analyzer. 

These  definitions  are  used  by  the  SEMI  analyzer  and  by 
other  native  code  specifications. 

These  may  not  have  constraints  generated  for  them 
using  the  normal  path  (guided  by  the  liveness  query) ; 

SEMI  may  just  decide  to  generate  its  own  constraints  for 
them  as  needed.  We  do  this  so  that  the  details  of  how 
they  are  used  are  kept  internal  to  SEMI. 

V 

makeCharArray ( )  { 

VALUE  =  new  [C; 

j  ava . lang . Ob j  ect . <init> (VALUE ) ; 

LEN  =  choose; 

VALUE  j ava. lang. Obj ect#arraylength  :=  LEN; 

L:  CH  =  choose; 

VALUE  j ava. lang. Obj ect #int array element  :=  CH; 
goto  L,  N; 

N:  return  =  choose  VALUE; 

} 

accessStringChars (STR)  { 

STR  j ava. lang. String. value; 

STR  j ava. lang. String. of fset; 

STR  j ava. lang. String. count ; 

} 

makelntArray ( )  { 

VALUE  =  new  [I; 

j  ava . lang . Obj  ect . <init> (VALUE ) ; 

LEN  =  choose; 

VALUE  j ava. lang. Obj ect #arraylength  :=  LEN; 

L:  I  =  choose; 

VALUE  j ava. lang. Obj ect#intarrayelement  :=  I; 
goto  L,  N; 

N:  return  =  choose  VALUE; 

} 

makeByteArray ( )  { 

VALUE  =  new  [B; 

j  ava . lang . Obj  ect . <init> (VALUE ) ; 

LEN  =  choose; 

VALUE  j ava. lang. Obj ect #arraylength  :=  LEN; 

L:  B  =  choose; 

VALUE  j ava. lang. Obj ect#intarrayelement  :=  B; 
goto  L,  N; 

N:  return  =  choose  VALUE; 

} 

makeString ( )  { 

VALUE  =  makeCharArray ( ) ; 

STR  =  new  java.lang.String; 

j ava. lang. String. <init> (STR,  VALUE)  " ([C)V" ; 
return  =  choose  STR; 

} 

mungeStrings (STR1,  STR2 )  { 

VALUE  =  makeCharArray ( ) ; 
goto  LI,  L2,  N; 

LI:  CHARS  =  STR1  j ava. lang. String. value; 
goto  R; 

L2 :  CHARS  =  STR2  j ava. lang. String. value; 

R:  CH  =  CHARS  j ava. lang. Obj ect #int array element; 


VALUE  j ava. lang. Obj ect #int array element  :=  CH; 
goto  LI,  L2,  N; 

N:  STR  =  new  java.lang.String; 

j ava. lang. String. <init> (STR,  VALUE)  " ([C)V" ; 
return  =  choose  STR,  STR1,  STR2 ; 

} 

initStringconst ( )  { 

STR  =  makeString () ; 
j ava. lang. String#internstr  :=  STR; 

} 

/*  Exception  functions  */ 

/*  _stringconst  is  invoked  to  generate  a  String  constant 
used  by  one  of  the  ldc*  instructions. 

It's  also  used  in  native  code  specifications.  */ 
_stringconst ( )  { 

return  =  j ava. lang. String#internstr; 

} 

/*  _magicexn  is  invoked  at  the  start  of  a  catch  block  to 
generate  all  the  exceptions  that  could  be  caught  there. 

V 

_magicexn ( )  { 

goto  LO,  LI,  L2 ,  L3,  L4 ,  L5,  L6,  L7,  L8,  L9,  L10, 

LI 1 ,  L12 ,  L13 ,  L14 ,  L15,  L16,  L18,  L19,  L20,  L21,  L22, 
L23 ,  L24 ,  L25 ; 

LO: 

STR  =  _stringconst ( ) ; 

EXN  =  new  j ava. lang. VirtualMachineError ; 
j  ava. lang. VirtualMachineError . <init> (EXN) ; 
j ava. lang. VirtualMachineError . <init> (EXN,  STR) ; 
goto  L; 

LI: 

STR  =  _stringconst ( ) ; 

EXN  =  new  j ava. lang. LinkageError ; 
j  ava. lang. LinkageError . <init> (EXN) ; 
j ava. lang. LinkageError . <init> (EXN,  STR) ; 
goto  L; 

L2 : 

STR  =  _stringconst ( ) ; 

EXN  =  new  j ava. lang. NullPointerException; 
j  ava. lang. NullPointerException. <init> (EXN) ; 
j ava. lang. NullPointerException. <init> (EXN,  STR) ; 
goto  L; 

L3 : 

STR  =  _stringconst ( ) ; 

EXN  =  new  j ava. lang. Array IndexOutOf Bounds Except ion; 
INT  =  choose;  //  not  linked  to  the  actual  array 
//  index  used 

j  ava. lang. ArraylndexOutOfBoundsException. <init> (EXN)  . 
j  ava. lang. ArraylndexOutOfBoundsException. <init> (EXN, 
INT)  " ( I ) V" ; 

j  ava. lang. ArraylndexOutOfBoundsException. <init> (EXN, 
STR)  " (Lj  ava. lang. String; ) V" ; 

goto  L; 

L4  : 

STR  =  _stringconst ( ) ; 

EXN  =  new  j ava. lang. ArrayStoreException; 
j  ava. lang. ArrayStoreException. <init> (EXN) ; 
j ava. lang. ArrayStoreException. <init> (EXN,  STR) ; 
goto  L; 

L5 : 

STR  =  _stringconst ( ) ; 

EXN  =  new  j ava. lang. ArithmeticException; 
j  ava. lang. ArithmeticException. <init> (EXN) ; 
j ava. lang. ArithmeticException. <init> (EXN,  STR) ; 
goto  L; 

L6 : 

STR  =  _stringconst ( ) ; 
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EXN  =  new  j ava. lang. NegativeArraySizeException; 
j  ava. lang. NegativeArraySizeException. <init> (EXN) ; 
j  ava. lang. NegativeArraySizeException. <init> (EXN, 
STR)  ; 

goto  L; 

L7 : 

STR  =  _stringconst ( ) ; 

EXN  =  new  j ava. lang. ClassCastException; 
j  ava. lang. ClassCastException. <init> (EXN) ; 
j ava. lang. ClassCastException. <init> (EXN,  STR) ; 
goto  L; 

L8 : 

STR  =  _stringconst  ( ) ; 

EXN  =  new  j ava. lang. IllegalMonitorStateException; 
j  ava. lang. IllegalMonitorStateException. <init> (EXN) ; 
j  ava. lang. IllegalMonitorStateException. <init> (EXN, 
STR)  ; 

goto  L; 

L9 : 

EXN  =  new  j ava. lang. ThreadDeath; 
j  ava. lang. ThreadDeath. <init> (EXN) ; 
goto  L; 

L10 : 

STR  =  _stringconst  ( ) ; 

EXN  =  new  j ava. lang. InternalError ; 
j  ava. lang. InternalError . <init> (EXN) ; 
j  ava. lang. InternalError . <init> (EXN,  STR) ; 
goto  L; 

Lift 

STR  =  _stringconst  ( ) ; 

EXN  =  new  j ava. lang. OutOfMemoryError ; 
j  ava. lang. OutOfMemoryError . <init> (EXN) ; 
java. lang. OutOfMemoryError .<init> (EXN,  STR) ; 
goto  L; 

L12 : 

STR  =  _stringconst  ( ) ; 

EXN  =  new  j ava. lang. StackOverflowError ; 
j  ava. lang. StackOverflowError . <init> (EXN) ; 
j ava. lang. StackOverflowError . <init> (EXN,  STR) ; 
goto  L; 

L13 : 

STR  =  _stringconst  ( ) ; 

EXN  =  new  j ava. lang.UnknownError ; 
j  ava. lang.UnknownError . <init> (EXN) ; 
j ava. lang.UnknownError . <init> (EXN,  STR) ; 
goto  L; 

L14  : 

STR  =  _stringconst  ( ) ; 

EXN  =  new  j ava. lang. AbstractMethodError ; 
j  ava. lang. AbstractMethodError . <init> (EXN) ; 
j ava. lang. AbstractMethodError . <init> (EXN,  STR) ; 
goto  L; 

L15 : 

STR  =  _stringconst  ( ) ; 

EXN  =  new  j ava. lang. ClassCircularityError ; 
j  ava. lang. ClassCircularityError . <init> (EXN) ; 
j ava. lang. ClassCircularityError . <init> (EXN,  STR) ; 
goto  L; 

L16 : 

STR  =  _stringconst  ( ) ; 

EXN  =  new  j ava. lang. ClassFormatError ; 
j  ava. lang. ClassFormatError . <init> (EXN) ; 
j ava. lang. ClassFormatError . <init> (EXN,  STR) ; 
goto  L; 

L18 : 

STR  =  _stringconst  ( ) ; 

EXN  =  new  j ava. lang. IllegalAccessError ; 
j  ava. lang. IllegalAccessError . <init> (EXN) ; 
j  ava. lang. IllegalAccessError . <init> (EXN,  STR) ; 
goto  L; 

L19 : 

STR  =  _stringconst  ( ) ; 

EXN  =  new  j ava. lang. IncompatibleClassChangeError ; 
j  ava. lang. IncompatibleClassChangeError . <init> (EXN) ; 
j  ava. lang. IncompatibleClassChangeError . <init> (EXN, 
STR)  ; 

goto  L; 

L20 : 

STR  =  _stringconst  ( ) ; 

EXN  =  new  j ava. lang. InstantiationError ; 
j  ava. lang. InstantiationError . <init> (EXN) ; 
j  ava. lang. InstantiationError . <init> (EXN,  STR) ; 
goto  L; 

L21 : 

STR  =  _stringconst  ( ) ; 

EXN  =  new  j ava. lang. NoClassDefFoundError ; 
j  ava. lang. NoClassDefFoundError .<init> (EXN) ; 
j ava. lang. NoClassDefFoundError .<init> (EXN,  STR) ; 
goto  L; 

L22 : 

STR  =  _stringconst  ( ) ; 

EXN  =  new  j ava. lang. NoSuchFieldError ; 


j  ava. lang. NoSuchFieldError . <init> (EXN) ; 
j ava. lang. NoSuchFieldError . <init> (EXN,  STR) ; 
goto  L; 

L23 : 

STR  =  _stringconst ( ) ; 

EXN  =  new  j ava. lang. NoSuchMethodError ; 
j  ava. lang. NoSuchMethodError . <init> (EXN) ; 
j ava. lang. NoSuchMethodError . <init> (EXN,  STR) ; 
goto  L; 

L24  : 

STR  =  _stringconst ( ) ; 

EXN  =  new  j ava. lang. Unsatisf iedLinkError; 
j  ava. lang. Unsatisf iedLinkError . <init> (EXN) ; 
j ava. lang. UnsatisfiedLinkError . <init> (EXN,  STR) ; 
goto  L; 

L25 : 

STR  =  _stringconst ( ) ; 

EXN  =  new  j ava. lang. VerifyError; 
j  ava. lang. VerifyError . <init> (EXN) ; 
j ava. lang. VerifyError . <init> (EXN,  STR) ; 
goto  L; 

L:  return  =  choose  EXN; 

} 

/*  _wrapclassinitializerexn  is  invoked  when  a  class 
initializer  method  <clinit>  is 

called.  Any  exception  thrown  by  <clinit>  is  passed 
through  here  to  simulate  the 

fact  that  the  VM  translates  it  to  an 
ExceptionlnlnitializerError .  */ 
_wrapclassinitializerexn (REALEXN)  { 

STR  =  _stringconst ( ) ; 

EXN  =  new  j ava. lang. ExceptionlnlnitializerError ; 
j  ava. lang. ExceptionlnlnitializerError . <init> (EXN) 
j  ava. lang. ExceptionlnlnitializerError . <init> (EXN, 
REALEXN)  " (Lj  ava. lang. Throwable; ) V" ; 

j  ava. lang. ExceptionlnlnitializerError . <init> (EXN, 
STR)  " (Lj  ava. lang. String; ) V" ; 
return  =  choose  EXN; 

} 

makelOException ( )  { 

STR  =  _stringconst ( ) ; 

EXN  =  new  j ava. io. IOException; 
j  ava. io. IOException. <init> (EXN) ; 
j  ava. io. IOException. <init> (EXN,  STR) ; 
return  =  choose  EXN; 

} 

/*  j ava. io. Obj ectlnputStream  */ 

j ava. io. Obj ectlnputStream. loadClassO (C,  NAME)  { 
return  =  j ava. lang. Class . forName (NAME) ; 

} 

makelnvalidClassException (CLASS )  { 

STR  =  _stringconst ( ) ; 

CNAME  =  _stringconst ( ) ; 

EXN  =  new  java.io.InvalidClassException; 
j ava. io. InvalidClassException. <init> (EXN,  CNAME) ; 
j  ava. io. InvalidClassException. <init> (EXN,  CNAME, 
STR)  ; 

return  =  choose  EXN; 

} 

makeStreamCorruptedException ( )  { 

STR  =  _stringconst ( ) ; 

EXN  =  new  j ava. io. StreamCorruptedException; 
j  ava. io. StreamCorruptedException. <init> (EXN) ; 
j  ava. io. StreamCorruptedException. <init> (EXN,  STR) 
return  =  choose  EXN; 

} 

j  ava. io. Obj  ectlnputStream. inputClassFields (THIS ,  OBJ, 
CLASS,  FIELDS)  { 

FIELD  =  FIELDS  j ava . lang . Obj ect#arrayelement ; 

goto  B,  S,  C,  I,  J,  Z,  F,  D,  L; 

B:  BYTE  =  j ava. io. Obj ectlnputStream. readByte (THIS ) ; 

EXN1  =  catch  (j ava. lang. Throwable)  BYTE; 
ReflectionHandler_assignSerializedFieldBYTE (OBJ, 
CLASS,  BYTE); 
goto  DONE; 

S:  SHORT  =  j ava. io. Obj ectlnputStream. readShort (THIS ) 

EXN1  =  catch  (j ava. lang. Throwable)  SHORT; 
ReflectionHandler_assignSerializedFieldSHORT (OBJ, 
CLASS,  SHORT); 
goto  DONE; 

C:  CHAR  =  j ava. io. Obj ectlnputStream. readChar (THIS ) ; 
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EXN1  =  catch  ( j ava. lang. Throwable)  CHAR; 
ReflectionHandler_assignSerializedFieldCHAR (OBJ, 
CLASS,  CHAR); 
goto  DONE; 

I:  INT  =  j ava. io. Obj ectlnputStream. readlnt (THIS ) ; 

EXN1  =  catch  (j ava. lang. Throwable)  INT; 
ReflectionHandler_assignSerializedFieldINT (OBJ, 

CLASS,  INT); 
goto  DONE; 

J:  LONG  =  j ava. io. Obj ectlnputStream. readLong (THIS ) ; 

EXN1  =  catch  (j ava. lang. Throwable)  LONG; 
ReflectionHandler_assignSerializedFieldLONG (OBJ, 
CLASS,  LONG); 
goto  DONE; 

Z:  BOOL  =  j ava. io. Obj ectlnputStream. readBoolean (THIS ) ; 

EXN1  =  catch  (j ava. lang. Throwable)  BOOL; 
ReflectionHandler_assignSerializedFieldBOOL (OBJ, 
CLASS,  BOOL); 
goto  DONE; 

F:  FLOAT  =  j ava. io. Obj ectlnputStream. readFloat (THIS ) ; 

EXN1  =  catch  (j ava. lang. Throwable)  FLOAT; 
ReflectionHandler_assignSerializedFieldFLOAT (OBJ, 
CLASS,  FLOAT); 
goto  DONE; 

D:  DOUBLE  =  j ava. io. Obj ectlnputStream. readDouble (THIS ) ; 

EXN1  =  catch  (j ava. lang. Throwable)  DOUBLE; 
ReflectionHandler_assignSerializedFieldDOUBLE (OBJ, 
CLASS,  DOUBLE); 
goto  DONE; 

L:  OBJECT  =  j ava. io. Obj ectlnputStream. readObj ect (THIS ) ; 

EXN1  =  catch  (j ava. lang. Throwable)  OBJECT; 
ReflectionHandler_assignSerializedFieldOBJECT (OBJ, 
CLASS,  OBJECT); 

DONE: 

EXN2  =  makeClassNotFoundException () ; 

EXN3  =  makelnvalidClassException (CLASS ) ; 

EXN4  =  makeStreamCorruptedException ( ) ; 
throw  =  choose  EXN1,  EXN2 ,  EXN3 ,  EXN4 ; 

} 

j  ava. io. Obj  ectlnputStream. alloc at eNewObj  ect (ACLASS , 
INITCLASS )  { 

OBJ  =  ReflectionHandler_makeSerializedObj ect (ACLASS ) ; 
EXN1  =  makelnstantiationException ( ) ; 

EXN2  =  makelllegalAccessException ( ) ; 
throw  =  choose  EXN1,  EXN2; 
return  =  choose  OBJ; 

} 

j  ava. io. Obj  ectlnputStream. allocateNewArray (ARRAYCLASS , 
LENGTH)  { 

OBJ  = 

ReflectionHandler_makeSerializedArray (ARRAYCLASS ) ; 
return  =  choose  OBJ; 

} 

j  ava. io. Obj  ectlnputStream. invokeObj  ectReader (THIS ,  OBJ, 
CLASS)  { 

10  =  ReflectionHandler_invoke_readObj ect (OBJ,  CLASS, 
THIS) ; 

EXN1  =  catch  (j ava. lang. Throwable)  10; 

EXN2  =  makeClassNotFoundException () ; 

EXN3  =  makelnvalidClassException (CLASS ) ; 

EXN4  =  makeStreamCorruptedException ( ) ; 
throw  =  choose  EXN1,  EXN2 ,  EXN3 ,  EXN4 ; 

} 

/*  j ava. io. Obj ectOutputStream  */ 

j  ava. io. Obj  ectOutputStream. outputClassFields (THIS ,  OBJ, 
CLASS,  FIELDS)  { 

FIELD  =  FIELDS  j ava . lang . Obj ect#arrayelement ; 

goto  B,  S,  C,  I,  J,  Z,  F,  D,  L; 

B:  BYTE  =  ReflectionHandler_getSerializedFieldBYTE (OB J, 

CLASS) ; 

10  =  j ava. io. Obj ectOutputStream. writeByte (THIS , 

BYTE) ; 

EXN1  =  catch  (j ava. lang. Throwable)  10; 
goto  DONE; 

S :  SHORT  = 

ReflectionHandler_getSerializedFieldSHORT (OBJ,  CLASS) ; 


10  =  j ava. io. Obj ectOutputStream. writeShort (THIS , 
SHORT) ; 

EXN1  =  catch  (j ava. lang. Throwable)  10; 
goto  DONE; 

C:  CHAR  =  ReflectionHandler_getSerializedFieldCHAR (OBJ, 

CLASS) ; 

10  =  j ava. io. Obj ectOutputStream. writeChar (THIS , 

CHAR)  ; 

EXN1  =  catch  (j ava. lang. Throwable)  10; 
goto  DONE; 

I:  INT  =  ReflectionHandler_getSerializedFieldINT (OBJ, 

CLASS) ; 

10  =  j ava. io. Obj ectOutputStream. writelnt (THIS ,  INT) ; 
EXN1  =  catch  (j ava. lang. Throwable)  10; 
goto  DONE; 

J:  LONG  =  ReflectionHandler_getSerializedFieldLONG (OBJ, 

CLASS) ; 

10  =  j ava. io. Obj ectOutputStream. writeLong (THIS , 

LONG) ; 

EXN1  =  catch  (j ava. lang. Throwable)  10; 
goto  DONE; 

Z:  BOOL  =  ReflectionHandler_getSerializedFieldBOOL (OBJ, 

CLASS) ; 

10  =  j ava. io. Obj ectOutputStream. writeBoolean (THIS , 
BOOL) ; 

EXN1  =  catch  (j ava. lang. Throwable)  10; 
goto  DONE; 

F:  FLOAT  = 

ReflectionHandler_getSerializedFieldFLOAT (OBJ,  CLASS) ; 

10  =  j ava. io. Obj ectOutputStream. writeFloat (THIS , 
FLOAT) ; 

EXN1  =  catch  (j ava. lang. Throwable)  10; 
goto  DONE; 

D:  DOUBLE  = 

ReflectionHandler_getSerializedFieldDOUBLE (OBJ,  CLASS) ; 

10  =  j ava. io. Obj ectOutputStream. writeDouble (THIS , 
DOUBLE) ; 

EXN1  =  catch  (j ava. lang. Throwable)  10; 
goto  DONE; 

L:  OBJECT  = 

ReflectionHandler_getSerializedFieldOBJECT (OBJ,  CLASS) ; 

10  =  j ava. io. Obj ectOutputStream. writeObj ect (THIS , 
OBJECT) ; 

EXN1  =  catch  (j ava. lang. Throwable)  10; 

DONE: 

EXN2  =  makelnvalidClassException (CLASS ) ; 
throw  =  choose  EXN1,  EXN2; 

} 

j  ava. io. Obj  ectOutputStream. invokeObj  ectWriter (THIS ,  OBJ, 
CLASS)  { 

10  =  ReflectionHandler_invoke_writeObj ect (OBJ,  CLASS, 
THIS) ; 

throw  =  catch  (j ava. lang. Throwable)  10; 

} 

/*  j ava. io. Obj ectStreamClass  */ 

j  ava. io. Obj  ectStreamClass . getClassAccess (C)  { 
return  =  j ava. lang. Class . getModifiers (C) ; 

} 

j  ava. io. Obj  ectStreamClass . getMethodSignatures (C)  { 
return  =  makeConstStringArray ( ) ; 

} 

j ava. io. Obj ectStreamClass . getMethodAccess (C,  SIG)  { 
return  =  choose; 

} 

j  ava. io. Obj  ectStreamClass . getFieldSignatures (C)  { 
return  =  makeConstStringArray ( ) ; 

} 

j ava. io. Obj ectStreamClass . getFieldAccess (C,  SIG)  { 
return  =  choose; 

} 

j  ava. io. Obj  ectStreamClass . getFieldsO (C)  { 

LIST  =  new  [Ljava.io.ObjectStreamField; 
j ava. lang. Obj ect . <init> (LIST) ; 

LEN  =  choose; 

LIST  j ava. lang. Obj ect#arraylength  :=  LEN; 
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L:  VALUE  =  new  j ava. io. Obj ectStreamField; 

NAME  =  _stringconst ( ) ; 

T  =  choose; 

0  =  choose; 

TS  =  _stringconst ( )  ; 

j ava. io. Obj ectStreamField. <init> (VALUE,  NAME,  T,  0, 

TS)  ; 

LIST  j ava. lang. Obj ect#arrayelement  :=  VALUE; 
goto  L,  N; 

N:  return  =  choose  LIST; 

} 

j  ava. io. Obj  ectStreamClass . getSerialVersionUID (C)  { 
return  =  choose; 

} 

j  ava. io. Obj  ectStreamClass . hasWriteObj  ect (C)  { 
return  =  choose; 

} 

/*  j ava. io. FileDescriptor  */ 

j ava. io. FileDescriptor . initSystemFD (FD,  DESC)  { 

FD  j ava. io. FileDescriptor . fd  :=  DESC; 
return  =  choose  FD; 

} 

j ava. io. FileDescriptor .valid ( )  { 
return  =  choose; 

} 

j ava. io. FileDescriptor . sync ( )  { 

EXN  =  new  j ava. io. SyncFailedException; 

STR  =  _stringconst(); 

j ava. io. SyncFailedException. <init> (EXN,  STR) ; 
throw  =  choose  EXN; 

} 

/*  j ava. io. FilelnputStream  */ 

j ava. io. FilelnputStream. open (THIS ,  NAME)  { 

FD  =  THIS  j ava. io. FilelnputStream. fd; 

NEWFD  =  choose; 

FD  j ava. io. FileDescriptor . fd  :=  NEWFD; 
throw  =  makelOException () ; 

} 

makelnterruptedlOException ( )  { 

STR  =  _stringconst ( ) ; 

EXN  =  new  j ava. io. InterruptedlOException; 
j  ava. io. InterruptedlOException. <init> (EXN) ; 
j ava. io. InterruptedlOException. <init> (EXN,  STR) ; 

NUM  =  choose; 

EXN  j  ava. io. InterruptedlOException. bytesTransf erred 
:=  NUM; 

return  =  choose  EXN; 

} 

j  ava. io. FilelnputStream. read (THIS )  { 
return  =  choose; 

EXN1  =  makelOException ()  ; 

EXN2  =  makelnterruptedlOException () ; 
throw  =  choose  EXN1,  EXN2; 

FD  =  THIS  j ava. io. FileOutputStream. fd; 

OSFD  =  FD  j ava. io. FileDescriptor . fd; 

} 

j ava. io. FilelnputStream. readBytes (THIS ,  B,  OFF,  LEN)  { 
return  =  choose  LEN; 

EXN1  =  makelOException () ; 

EXN2  =  makelnterruptedlOException () ; 
throw  =  choose  EXN1,  EXN2; 

FD  =  THIS  j ava. io. FileOutputStream. fd; 

OSFD  =  FD  j ava. io. FileDescriptor . fd; 

} 

java. io. FilelnputStream. skip (THIS,  N)  { 
return  =  choose  N; 
throw  =  makelOException () ; 

FD  =  THIS  j ava. io. FileOutputStream. fd; 

OSFD  =  FD  j ava. io. FileDescriptor . fd; 

} 

j  ava. io. FilelnputStream. available (THIS )  { 
return  =  choose; 
throw  =  makelOException () ; 

FD  =  THIS  j ava. io. FileOutputStream. fd; 

OSFD  =  FD  j ava. io. FileDescriptor . fd; 

} 


j  ava. io. FilelnputStream. close (THIS )  { 
throw  =  makelOException () ; 

FD  =  THIS  j ava. io. FileOutputStream. fd; 

OSFD  =  FD  j ava. io. FileDescriptor . fd; 

} 

/*  j ava. io. FileOutputStream  */ 

j ava. io. FileOutputStream. open (THIS ,  NAME)  { 

FD  =  THIS  j ava. io. FileOutputStream. fd; 

NEWFD  =  choose; 

FD  j ava. io. FileDescriptor . fd  :=  NEWFD; 
throw  =  makelOException () ; 

} 

j ava. io. FileOutputStream. openAppend (THIS ,  NAME)  { 

FD  =  THIS  j ava. io. FileOutputStream. fd; 

NEWFD  =  choose; 

FD  j ava. io. FileDescriptor . fd  :=  NEWFD; 
throw  =  makelOException () ; 

} 

j ava. io. FileOutputStream. write (THIS ,  B)  { 

EXN1  =  makelOException () ; 

EXN2  =  makelnterruptedlOException () ; 
throw  =  choose  EXN1,  EXN2; 

FD  =  THIS  j ava. io. FileOutputStream. fd; 

OSFD  =  FD  j ava. io. FileDescriptor . fd; 

} 

j ava. io. FileOutputStream. writeBytes (THIS ,  B,  OFF,  LEN)  { 
EXN1  =  makelOException () ; 

EXN2  =  makelnterruptedlOException () ; 
throw  =  choose  EXN1,  EXN2; 

FD  =  THIS  j ava. io. FileOutputStream. fd; 

OSFD  =  FD  j ava. io. FileDescriptor . fd; 

} 

j  ava. io. FileOutputStream. close (THIS )  { 
throw  =  makelOException () ; 

FD  =  THIS  j ava. io. FileOutputStream. fd; 

OSFD  =  FD  j ava. io. FileDescriptor . fd; 

} 

/*  java. io. File  */ 

j  ava. io. File . lastModifiedO (THIS )  { 
return  =  choose; 

} 

j ava. io. File . lengthO (THIS )  { 
return  =  choose; 

} 

j ava. io. File . existsO (THIS )  { 
return  =  choose; 

} 

j ava. io. File . canWriteO (THIS )  { 
return  =  choose; 

} 

j ava. io. File . canReadO (THIS )  { 
return  =  choose; 

} 

j ava. io. File . isFileO (THIS )  { 
return  =  choose; 

} 

j  ava. io. File . isDirectoryO (THIS )  { 
return  =  choose; 

} 

j ava. io. File .mkdirO (THIS )  { 
return  =  choose; 

} 

j ava. io. File . deleteO (THIS )  { 
return  =  choose; 

} 

j ava. io. File . rmdirO (THIS )  { 
return  =  choose; 

} 

j ava. io. File . renameToO (THIS ,  DEST)  { 

PATH  =  DEST  j ava. io. File . path; 

THIS  j ava. io. File . path  :=  PATH; 
return  =  choose; 

} 
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makeDynamicStringArray ( )  { 

LIST  =  new  [Ljava.lang.String; 
j ava. lang. Obj ect . <init> (LIST) ; 

LEN  =  choose; 

LIST  j ava. lang. Obj ect#arraylength  :=  LEN; 

L:  STR  =  makeString () ; 

LIST  j ava. lang. Obj ect #ar ray element  :=  STR; 
goto  L,  N; 

N:  return  =  choose  LIST; 


/*  j ava. lang. Obj ect  */ 

j  ava. lang. Obj  ect . hashCode (THIS )  { 

HASH  =  THIS  j ava. lang. Obj ect#identity; 
return  =  choose  HASH; 

} 

j  ava. lang. Obj  ect . getClass (THIS )  { 
return  =  makeClass ()  ; 

} 


makeConstStringArray ( )  { 

LIST  =  new  [Ljava.lang.String; 
j ava. lang. Obj ect . <init> (LIST) ; 

LEN  =  choose; 

LIST  j ava. lang. Obj ect#arraylength  :=  LEN; 

L:  STR  =  _stringconst ( ) ; 

LIST  j ava. lang. Obj ect#arrayelement  :=  STR; 
goto  L,  N; 

N:  return  =  choose  LIST; 

} 

j ava. io. File . listO (THIS )  { 

return  =  makeDynamicStringArray ()  ; 

} 

j ava. io. File . canonPath (THIS )  { 

CURPATH  =  THIS  j ava . io . File . path; 

STR  =  makeString () ; 

return  =  mungeStrings (CURPATH,  STR); 

} 

j ava. io. File . isAbsolute (THIS )  { 
return  =  choose; 

} 

/*  j ava. io. RandomAccess File  */ 

j ava. io. RandomAccessFile . open (THIS ,  NAME,  WRITEABLE)  { 

FD  =  THIS  j ava. io. RandomAccessFile . fd; 

NEWFD  =  choose; 

FD  j ava. io. FileDescriptor . fd  :=  NEWFD; 
throw  =  makelOException () ; 

} 

j  ava. io. RandomAccessFile . read (THIS )  { 
return  =  choose; 

EXN1  =  makelOException () ; 

EXN2  =  makelnterruptedlOException () ; 
throw  =  choose  EXN1,  EXN2; 

} 

j ava. io. RandomAccessFile . readBytes (THIS ,  B,  OFF,  LEN)  { 
return  =  choose  LEN; 

EXN1  =  makelOException () ; 

EXN2  =  makelnterruptedlOException () ; 
throw  =  choose  EXN1,  EXN2; 

} 

j ava. io. RandomAccessFile .write (THIS ,  B)  { 

EXN1  =  makelOException () ; 

EXN2  =  makelnterruptedlOException () ; 
throw  =  choose  EXN1,  EXN2; 

} 

j ava. io. RandomAccessFile .writeBytes (THIS ,  B,  OFF,  LEN)  { 
EXN1  =  makelOException () ; 

EXN2  =  makelnterruptedlOException () ; 
throw  =  choose  EXN1,  EXN2; 

} 

j  ava. io. RandomAccessFile . getFilePointer (THIS )  { 
return  =  choose; 
throw  =  makelOException () ; 

} 

j ava. io. RandomAccessFile . seek (THIS ,  POS)  { 
throw  =  makelOException () ; 

} 

j  ava. io. RandomAccessFile . length (THIS)  { 
return  =  choose; 
throw  =  makelOException () ; 

} 

j  ava. io. RandomAccessFile . close (THIS )  { 
throw  =  makelOException () ; 

} 


j ava. lang. Obj ect . clone (THIS )  { 

STR  =  _stringconst ( ) ; 

EXN1  =  new  j ava. lang. CloneNotSupportedException; 
j  ava . lang . CloneNotSupportedException . <init> ( EXN1 ) ; 
j  ava . lang . CloneNotSupportedException . <init> ( EXN1 , 
STR)  ; 

throw  =  choose  EXN1; 
return  =  choose  THIS; 

} 

makelllegalMonitorStateException ( )  { 

STR  =  _stringconst ( ) ; 

EXN  =  new  j ava. lang. IllegalMonitorStateException; 
j  ava. lang. IllegalMonitorStateException. <init> (EXN) ; 
j  ava. lang. IllegalMonitorStateException. <init> (EXN, 
STR)  ; 

return  =  choose  EXN; 

} 

j ava. lang. Obj ect . notify (THIS )  { 

throw  =  makelllegalMonitorStateException ( ) ; 

} 

j  ava. lang. Obj  ect . notifyAll (THIS )  { 

throw  =  makelllegalMonitorStateException ( ) ; 

} 

j ava. lang. Obj ect .wait (THIS ,  TIMEOUT)  { 

throw  =  makelllegalMonitorStateException ( ) ; 

} 

j ava. lang. Obj ect .wait (THIS ,  TIMEOUT)  { 

EXN1  =  makelllegalMonitorStateException ( ) ; 

STR  =  _stringconst ( ) ; 

EXN2  =  new  j ava. lang. IllegalArgumentException; 
j  ava. lang. IllegalArgumentException. <init> (EXN1 ) ; 
j  ava. lang. IllegalArgumentException. <init> (EXN1 ,  STR) 
STR  =  _stringconst ( ) ; 

EXN3  =  new  j ava. lang. InterruptedException; 
j  ava . lang . InterruptedException . <init> ( EXN3 ) ; 
j ava. lang. InterruptedException. <init> (EXN3 ,  STR) ; 
throw  =  choose  EXN1,  EXN2,  EXN3; 

} 

/*  j ava. lang. Math  */ 

j ava. lang. Math. sin (A)  { 
return  =  choose; 

} 

j ava. lang. Math. cos (A)  { 
return  =  choose; 

} 

j ava. lang. Math. tan (A)  { 
return  =  choose; 

} 

j ava. lang. Math. asin (A)  { 
return  =  choose; 

} 

j ava. lang. Math. acos (A)  { 
return  =  choose; 

} 

j ava. lang. Math. atan (A)  { 
return  =  choose; 

} 

j ava. lang. Math. exp (A)  { 
return  =  choose; 

} 

j ava. lang. Math. log (A)  { 
return  =  choose; 

} 

j ava. lang. Math. sgrt (A)  { 
return  =  choose; 
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} 

j ava. lang.Math. IEEERemainder (FI ,  F2)  { 
return  =  choose; 

} 

j ava. lang.Math. ceil (A)  { 
return  =  choose; 

} 

j ava. lang.Math. floor (A)  { 
return  =  choose; 

} 

j ava. lang.Math. rint (A)  { 
return  =  choose; 

} 

j ava. lang.Math. atan2 (A,  B)  { 
return  =  choose; 

} 

j ava. lang.Math. pow (A,  B)  { 
return  =  choose; 

} 

/*  j ava. lang. Float  */ 

j  ava . lang . Float . f loatToIntBits ( FLOAT )  { 
return  =  choose; 

} 

j  ava. lang. Float . intBitsToFloat (BITS)  { 
return  =  choose; 

} 

/*  j ava. lang. Double  */ 

j ava. lang. Double . doubleToLongBits (DOUBLE)  { 
return  =  choose; 

} 

j  ava. lang. Double . longBitsToDouble (BITS )  { 
return  =  choose; 

} 

j ava. lang. Double .valueOfO (S )  { 

EXN  =  new  j ava. lang. Number Format Except ion; 

STR  =  _stringconst ( ) ; 

j  ava. lang. NumberFormatException. <init> (EXN) ; 
j ava. lang. Number Format Except ion. <init> (EXN,  STR) ; 
throw  =  choose  EXN; 
return  =  choose; 

} 

/*  j ava. lang. Throwable  */ 

j  ava. lang. Throwable . filllnStackTrace (THIS )  { 

TRACE  =  choose; 

THIS  j ava. lang. Throwable . backtrace  :=  TRACE; 
return  =  choose  THIS; 

} 

/*  This  doesn't  really  work.  The  printStackTraceO 
documentation  says  that  the  STREAM  should  have  a 
println (char [ ] )  method,  but  we  don't  know  what  class  it's 
in,  so  how  can  we  call  it?  We  probably  need  lots  of  extra 
ugly  support  to  get  this  really  right.  For  now  we  just 
ignore  the  STREAM.  */ 

j ava. lang. Throwable . printStackTraceO (THIS ,  STREAM)  { 

} 

/*  j ava. lang. Thread  */ 

j  ava . lang . Thread . currentThread ( )  { 

T  =  j ava. lang. Thread#currentthread; 
return  =  choose  T; 

} 

j ava. lang. Thread. yield ( )  { 

} 

j ava. lang. Thread. sleep (MILLIS )  { 

EXN  =  new  j ava. lang. InterruptedException; 

STR  =  _stringconst ( ) ; 

j  ava. lang. InterruptedException.<init> (EXN) ; 
j ava. lang. InterruptedException. <init> (EXN,  STR) ; 
throw  =  choose  EXN; 

} 

j ava. lang. Thread. start (THIS )  { 

EXN  =  new  j ava. lang. IllegalThreadStateException; 


STR  =  _stringconst ( ) ; 

j  ava. lang. IllegalThreadStateException. <init> (EXN) ; 
j  ava. lang. IllegalThreadStateException. <init> (EXN, 
STR)  ; 

throw  =  choose  EXN; 

java. lang. Thread. run (THIS) ; 

} 

//  not  sure  what  this  does 

j ava. lang. Thread. is Interrupted (THIS ,  CLEAR)  { 
return  =  choose; 

} 

j ava. lang. Thread. isAlive (THIS )  { 
return  =  choose; 

} 

j  ava. lang. Thread. countStackFrames (THIS )  { 
return  =  choose; 

} 

java. lang. Thread. setPriorityO (THIS,  PRIORITY)  { 

} 

j ava. lang. Thread. stopO (THIS )  { 

} 

j  ava. lang. Thread. suspendO (THIS )  { 

} 

j ava. lang. Thread. resumeO (THIS )  { 

} 

java. lang. Thread. interrupt 0 (THIS)  { 

} 

/*  j ava. lang. Compiler  */ 

j  ava. lang. Compiler . initialize ( )  { 

} 

j  ava. lang. Compiler . compileClass (C)  { 
return  =  choose; 

} 

j  ava. lang. Compiler . compileClasses (CS )  { 
return  =  choose; 

} 

j ava. lang. Compiler . commmand (C)  { 
return  =  choose; 

} 

j ava. lang. Compiler . enable ( )  { 

} 

j ava. lang. Compiler . disable ( )  { 

} 

/*  j ava. lang. Win32Process  */ 

j  ava. lang. Win32Process . exitValue ( )  { 
result  =  choose; 

} 

j  ava. lang. Win32 Process .wait For ( )  { 
result  =  choose; 

} 

j  ava. lang. Win32 Process . destroy ( )  { 

} 

j ava. lang. Win32Process . create (CMD,  ENV)  { 
accessStringChars (CMD) ; 
accessStringChars (ENV) ; 

} 

j ava. lang. Win32Process . close ( )  { 

} 

/*  j ava. lang. Runtime  */ 

j ava. lang. Runtime . exit Internal (THIS ,  STATUS)  { 

} 

j ava. lang. Runtime . runFinalizersOnExitO (THIS ,  VALUE)  { 

} 

j ava. lang. Runtime . exec Internal (THIS ,  CMDARRAY,  ENVP)  { 
PROCESS  =  new  j ava. lang. Win32Process ; 
j  ava. lang. Win32Process . <init> (PROCESS ,  CMDARRAY, 
ENVP )  ; 
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return  =  choose  PROCESS; 

} 

j  ava. lang. Runtime . freeMemory (THIS )  { 
return  =  choose; 

} 

j  ava. lang. Runtime . totalMemory (THIS )  { 
return  =  choose; 

} 

j ava. lang. Runtime . gc (THIS )  { 

} 

j  ava. lang. Runtime . runFinalization (THIS )  { 

} 

j ava. lang. Runtime . tracelnstructions (THIS ,  ON)  { 

} 

j ava. lang. Runtime . traceMethodCalls (THIS ,  ON)  { 

} 

j  ava. lang. Runtime . initializeLinker Internal (THIS )  { 
return  =  j ava. lang. String#internstr; 

} 

j ava. lang. Runtime . buildLibName (THIS ,  PATHNAME,  FILENAME) 

{ 

BUF  =  new  j ava. lang. StringBuf fer; 
j  ava . lang . StringBuf fer . <init> (BUF ,  PATHNAME ) 

" (Lj  ava. lang. String; ) V" ; 

STR  =  j ava. lang. String#internstr; 
j  ava. lang. StringBuf fer . append (BUF,  STR) 

" (Lj  ava. lang. String; ) Lj  ava. lang. StringBuf fer ; " ; 

j  ava. lang. StringBuf fer . append (BUF,  FILENAME) 

" (Lj  ava. lang. String; ) Lj  ava. lang. StringBuf fer ; " ; 

STR  =  j ava. lang. String#internstr; 
j  ava. lang. StringBuf fer . append (BUF,  STR) 

" (Lj  ava. lang. String; ) Lj  ava. lang. StringBuf fer ; " ; 

return  =  j ava. lang. StringBuf fer .toString (BUF) ; 

} 

j ava. lang. Runtime . loadFilelnternal (THIS ,  FILENAME)  { 
return  =  choose; 

} 

/*  j ava. lang. String  */ 
j ava. lang. String. intern (THIS )  { 
goto  Y,  N; 

Y:  j ava. lang. String#internstr  :=  THIS; 

N:  return  =  j ava. lang. String#internstr; 

} 

/*  j ava. lang. System  */ 

j  ava. lang. System. currentTimeMillis ( )  { 

//  this  just  returns  an  arbitrary  fresh  value 
return  =  choose; 

} 

j  ava. lang. System. identityHashCode (OBJ)  { 

HASH  =  OBJ  j ava. lang. Obj ect#identity; 
return  =  choose  HASH; 

} 

//  This  one  might  need  to  be  changed.  In  particular,  it 
might  call 
//  Properties . read 

j  ava. lang. System. initProperties (PROPS )  { 

PROP  =  makeString ( ) ; 

STR  =  makeString ( ) ; 

j ava. util . Hashtable . put (PROPS ,  PROP,  STR); 
return  =  choose  PROPS; 

} 

j ava. lang. System. setlnO ( IN)  { 
j ava. lang. System. in  :=  IN; 

} 

j ava. lang. System. setOutO (OUT)  { 
j ava. lang. System. out  :=  OUT; 

} 

j ava. lang. System. setErrO (ERR)  { 
j ava. lang. System. err  :=  ERR; 

} 

j ava. lang. System. setlnO ( IN)  { 
j ava. lang. System. in  :=  IN; 


} 

j ava. lang. System. arraycopy (FROM,  FROMOFF,  TO,  TOOFF,  LEN) 

{ 

VAL  =  FROM  j ava. lang. Obj ect#arrayelement; 

TO  j ava. lang. Obj ect#arrayelement  :=  VAL; 

VAL  =  FROM  j ava. lang. Obj ect#intarrayelement ; 

TO  j ava. lang. Obj ect#intarrayelement  :=  VAL; 

VAL  =  FROM  j ava. lang. Obj ect#f loatarrayelement ; 

TO  j ava. lang. Obj ect#f loatarrayelement  :=  VAL; 

VAL  =  FROM  j ava. lang. Obj ect#longarrayelement ; 

TO  j ava. lang. Obj ect#longarrayelement  :=  VAL; 

VAL  =  FROM  j ava. lang. Obj ect#doublearrayelement; 

TO  j ava. lang. Obj ect#doublearrayelement  :=  VAL; 

} 

/*  j ava. lang. Class  */ 
makeClass()  { 

CLASS  =  new  j ava. lang. Class ; 
j ava. lang. Class . <init> (CLASS ) ; 
j ava. lang. Class#internclass  :=  CLASS; 
return  =  j ava. lang. Class#internclass ; 

} 

makeSigner()  { 

return  =  j ava. lang. Class#internsigner ; 

} 

makeClassArray ( )  { 

CS  =  new  [Lj ava. lang. Class ; 
j  ava . lang . Obj  ect . <init> ( CS ) ; 

LEN  =  choose; 

CS  j ava. lang. Obj ect#arraylength  :=  LEN; 

L:  C  =  makeClass () ; 

CS  j ava. lang. Obj ect#arrayelement  :=  C; 
goto  L,  N; 

N:  return  =  choose  CS; 

} 

makeField (CLASS )  { 

FIELD  =  new  j ava. lang. reflect . Field; 
j  ava. lang. reflect . Field. <init> (FIELD) ; 

FIELD  j ava. lang. reflect . Field. clazz  :=  CLASS; 

SLOT  =  choose; 

FIELD  j ava. lang. reflect . Field. slot  :=  SLOT; 

NAME  =  _stringconst ( ) ; 

FIELD  j ava. lang. reflect . Field. name  :=  NAME; 

TYPE  =  makeClass () ; 

FIELD  j ava. lang. reflect . Field. type  :=  TYPE; 

j ava. lang. Field#internfield  :=  FIELD; 
return  =  j ava. lang. Field#internfield; 

} 

makeMethod (CLASS )  { 

METHOD  =  new  j ava. lang. reflect .Method; 
j  ava. lang. reflect .Method. <init> (METHOD) ; 

METHOD  j ava. lang. reflect .Method. clazz  :=  CLASS; 

SLOT  =  choose; 

METHOD  j ava. lang. reflect .Method. slot  :=  SLOT; 

NAME  =  _stringconst ( ) ; 

METHOD  j ava. lang. reflect .Method. name  :=  NAME; 
RETURNTYPE  =  makeClass (); 

METHOD  java. lang. reflect .Method. returnType  := 
RETURNTYPE; 

PARAMETERTYPES  =  makeClassArray () ; 

METHOD  j ava. lang. reflect .Method. parameterTypes  := 
PARAMETERTYPES; 

EXCEPT IONTYPES  =  makeClassArray () ; 

METHOD  j ava. lang. reflect .Method. exceptionTypes  := 
EXCEPTIONTYPES ; 

MODS  =  choose; 

METHOD  j ava. lang. reflect . Constructor#mods  :=  MODS; 

j ava. lang. reflect .Method#internmethod  :=  METHOD; 
return  =  j ava. lang. reflect .Method#internmethod; 

} 

makeConstructor (CLASS )  { 

CONSTRUCTOR  =  new  j ava. lang. reflect . Constructor ; 
j  ava. lang. reflect . Constructor . <init> (CONSTRUCTOR) ; 
CONSTRUCTOR  j ava . lang . ref lect . Constructor . clazz  := 
CLASS; 

SLOT  =  choose; 

CONSTRUCTOR  j ava . lang . ref lect . Constructor . slot  := 
SLOT; 

PARAMETERTYPES  =  makeClassArray () ; 
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CONSTRUCTOR 

j ava. lang. reflect . Constructor . parameter Types  :  = 
PARAMETERTYPES; 

EXCEPT IONTYPES  =  makeClassArray ( ) ; 

CONSTRUCTOR 

j  ava. lang. reflect . Constructor . exceptionTypes  :  = 
EXCEPTIONTYPES ; 

MODS  =  choose; 

CONSTRUCTOR  j ava . lang . ref lect . Constructor#mods  := 

MODS  ; 

j  ava. lang. reflect . Constructor#internconstructor  :  = 
CONSTRUCTOR; 
return  = 

j  ava. lang. reflect . Constructor#internconstructor ; 

} 

makelnstantiationException ( )  { 

STR  =  _stringconst ( ) ; 

EXN  =  new  j ava. lang. InstantiationException; 
j  ava. lang. InstantiationException. <init> (EXN) ; 
j ava. lang. InstantiationException. <init> (EXN,  STR) ; 
return  =  choose  EXN; 

} 

makelllegalAccessException ( )  { 

STR  =  _stringconst ( ) ; 

EXN  =  new  j ava. lang. IllegalAccessException; 
j  ava. lang. IllegalAccessException. <init> (EXN) ; 
j  ava. lang. IllegalAccessException. <init> (EXN,  STR) ; 
result  =  choose  EXN; 

} 

make I llegalArgument Except ion ( )  { 

STR  =  _stringconst ( ) ; 

EXN  =  new  j ava. lang. IllegalArgumentException; 
j  ava. lang. IllegalArgumentException. <init> (EXN) ; 
j ava. lang. IllegalArgumentException. <init> (EXN,  STR) ; 
result  =  choose  EXN; 

} 

makelnvocationTargetException (CATCH)  { 

STR  =  _stringconst ( ) ; 

EXN  =  new 

j  ava. lang. reflect . InvocationTargetException; 

j  ava. lang. reflect . InvocationTargetException. <init> (EXN) ; 

j  ava. lang. reflect . InvocationTargetException. <init> (EXN, 
CATCH) ; 

j  ava.  lang. reflect . InvocationTargetException. <init> (EXN, 
CATCH,  STR) ; 

result  =  choose  EXN; 

} 

makeClassNotFoundException ( )  { 

STR  =  _stringconst ( ) ; 

EXN  =  new  j ava. lang. ClassNotFoundException; 
j  ava. lang. ClassNotFoundException. <init> (EXN) ; 
j ava. lang. ClassNotFoundException. <init> (EXN,  STR) ; 
return  =  choose  EXN; 

} 

j ava. lang. Class . forName (NAME)  { 

throw  =  makeClassNotFoundException () ; 
return  =  makeClass () ; 

} 

j  ava. lang. Class . newlnstance (CLASS )  { 

OBJ  = 

ReflectionHandler_makeObj  ectAndCallZeroArgConstructor (CLA 
SS)  ; 

EXN1  =  makelnstantiationException () ; 

EXN2  =  makelllegalAccessException () ; 
throw  =  choose  EXN1,  EXN2; 
return  =  choose  OBJ; 

} 

j ava. lang. Class . islnstance (C)  { 
return  =  choose; 

} 

j  ava. lang. Class . isAssignableFrom (C)  { 
return  =  choose; 

} 

j ava. lang. Class . islnterface (C)  { 
return  =  choose; 

} 

j ava. lang. Class . isArray (C)  { 


return  =  choose; 

} 

j ava. lang. Class . isPrimitive (C)  { 
return  =  choose; 

} 

j ava. lang. Class . getName (C)  { 

STR  =  _stringconst ( ) ; 
return  =  choose  STR; 

} 

j  ava. lang. Class . getClassLoader (C)  { 
return  =  makeClassLoader ( ) ; 

} 

j  ava. lang. Class . get Super class (C)  { 
return  =  makeClass ()  ; 

} 

j  ava. lang. Class . get Inter faces (C)  { 
return  =  makeClassArray ()  ; 

} 

j  ava. lang. Class . get Component Type (C)  { 
return  =  makeClass ()  ; 

} 

j  ava. lang. Class . getModifiers (C)  { 
return  =  choose; 

} 

j ava. lang. Class . getSigners (C)  { 

OS  =  new  [Lj ava. lang. Obj ect ; 
j ava. lang. Obj ect . <init> (OS ) ; 

LEN  =  choose; 

OS  j ava. lang. Obj ect#arraylength  :=  LEN; 

L:  0  =  makeSigner ( ) ; 

OS  j ava. lang. Obj ect#arrayelement  :=  0; 
goto  L,  N; 

N:  return  =  choose  OS; 

} 

j ava. lang. Class . setSigners (OS )  { 

L:  0  =  OS  j ava. lang. Obj ect#arrayelement ; 

j ava. lang. Class#internsigner  :=  0; 
goto  L,  N; 

N:  return  =  choose; 

} 

j  ava. lang. Class . getPrimitiveClass (NAME)  { 
return  =  makeClass () ; 

} 

j  ava. lang. Class . getDeclaringClass (C)  { 
return  =  makeClass  (); 

} 

j ava. lang. Class . getClasses (C)  { 
return  =  makeClassArray () ; 

} 

j ava. lang. Class . getFieldsO (THIS ,  WHICH)  { 

FS  =  new  [Ljava.lang.reflect.Field; 
j ava. lang. Obj ect . <init> (FS ) ; 

LEN  =  choose; 

FS  j ava. lang. Obj ect#arraylength  :=  LEN; 

L:  F  =  makeField (THIS ) ; 

FS  j ava. lang. Obj ect#arrayelement  :=  F; 
goto  L,  N; 

N:  return  =  choose  FS; 

} 

j ava. lang. Class . getFieldO (THIS ,  NAME,  WHICH)  { 

STR  =  _stringconst ( ) ; 

EXN  =  new  j ava. lang. NoSuchFieldException; 
j  ava. lang. NoSuchFieldException. <init> (EXN) ; 
j  ava. lang. NoSuchFieldException. <init> (EXN,  STR) 
throw  =  choose  EXN; 

return  =  makeField (THIS ) ; 

} 

j ava. lang. Class . getMethodsO (THIS ,  WHICH)  { 

MS  =  new  [Lj ava. lang. reflect .Method; 
j  ava . lang . Obj  ect . <init> (MS ) ; 

LEN  =  choose; 
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MS  j ava. lang. Obj ect#arraylength  :=  LEN; 

L:  M  =  makeMethod (THIS ) ; 

MS  j ava. lang. Obj ect#arrayelement  :=  M; 
goto  L,  N; 

N:  return  =  choose  MS; 

} 

makeNoSuchMethodException ( )  { 

STR  =  _stringconst ( ) ; 

EXN  =  new  j ava. lang. NoSuchMethodExcept ion; 
j  ava. lang. NoSuchMethodException.<init> (EXN) ; 
j ava. lang. NoSuchMethodException.<init> (EXN,  STR) ; 
return  =  choose  EXN; 

} 

j ava. lang. Class. getMethodO (THIS,  NAME,  PARAMETERTYPES , 
WHICH)  { 

throw  =  makeNoSuchMethodException ( ) ; 
return  =  makeMethod (THIS ) ; 

} 

j ava. lang. Class . getConstructorsO (THIS ,  WHICH)  { 

CS  =  new  [Lj ava. lang. reflect . Constructor ; 
j  ava . lang . Obj  ect . <init> ( CS ) ; 

LEN  =  choose; 

CS  j ava. lang. Obj ect#arraylength  :=  LEN; 

L:  C  =  makeConstructor (THIS ) ; 

CS  j ava. lang. Obj ect#arrayelement  :=  C; 
goto  L,  N; 

N:  return  =  choose  CS; 

} 

j  ava. lang. Class . getConstructorO (THIS,  PARAMETERTYPES , 
WHICH)  { 

throw  =  makeNoSuchMethodException ( ) ; 
return  =  makeConstructor (THIS ) ; 

} 

/*  j ava. lang. ClassLoader  */ 
makeClassLoader ( )  { 

return  =  j ava. lang. ClassLoader#internloader; 

} 

j  ava. lang. ClassLoader . init (THIS )  { 

j ava. lang. ClassLoader#internloader  :=  THIS; 

} 

j ava. lang. ClassLoader . defineClassO (THIS ,  NAME,  DATA, 
OFFSET,  LENGTH)  { 

return  =  makeClass () ; 

} 

j ava. lang. ClassLoader . resolveClassO (THIS ,  C)  { 

} 

j ava. lang. ClassLoader . findSystemClassO (THIS ,  NAME)  { 
throw  =  makeClassNotFoundException ( ) ; 
return  =  makeClass () ; 

} 

j  ava. lang. ClassLoader . getSystemResourceAsStreamO (THIS , 
NAME)  { 

URL  =  j ava. lang. ClassLoader . getSystemResource (NAME) ; 
return  =  j ava. net .URL. openStream(URL) ; 

} 

j  ava. lang. ClassLoader . getSystemResourceAsNameO (THIS , 

NAME)  { 

return  =  _stringconst ( ) ; 

} 

/*  j ava. lang. reflect . Constructor  */ 

j  ava. lang. reflect . Constructor . getModifiers (THIS )  { 

return  =  THIS  j ava. lang. reflect . Constructor#mods ; 

} 

j ava. lang. reflect . Constructor . newlnstance (THIS ,  ARGS)  { 
ARGS  j  ava. lang. Obj  ect #ar ray length; 

OBJ  = 

ReflectionHandler_makeObj  ectAndCallArbitraryConstructor (A 
RGS )  ; 

CATCH  =  catch  (j ava. lang. Throwable)  OBJ; 

EXN1  =  makelnstantiationException ( ) ; 

EXN2  =  makelllegalAccessException () ; 

EXN3  =  makelllegalArgumentException () ; 

EXN4  =  makelnvocationTargetException (CATCH) ; 


throw  =  choose  EXN1,  EXN2 ,  EXN3 ,  EXN4 ; 
return  =  choose  OBJ; 


/*  j ava. lang. reflect .Method  */ 

j  ava. lang. reflect .Method. getModifiers (THIS )  { 

return  =  THIS  j ava. lang. reflect . Method#mods ; 

} 

j ava. lang. reflect .Method. invoke (THIS ,  TARGET,  ARGS)  { 
ARGS  j  ava. lang. Obj  ect#arraylength; 

OBJ  =  Ref lectionHandler_callArbitraryMethod (TARGET, 
ARGS ) ; 

CATCH  =  catch  (j ava. lang. Throwable)  OBJ; 

EXN2  =  makelllegalAccessException () ; 

EXN3  =  makelllegalArgumentException () ; 

EXN4  =  makelnvocationTargetException (CATCH) ; 
throw  =  choose  EXN2,  EXN3,  EXN4 ; 
return  =  choose  OBJ; 


/*  j ava. util . ResourceBundle  */ 

j  ava. util . ResourceBundle . getClassContext ( )  { 
return  =  makeClassArray ( ) ; 

} 

/*  j ava. util . zip. Inflater  */ 

j ava. util . zip. Inflater . setDictionary (THIS ,  B,  OFF,  LEN)  { 
THIS  j  ava. util . zip. Inflater . strm; 

NEWNEEDDICT  =  choose; 

THIS  j ava. util . zip. Inflater . needsDictionary  := 
NEWNEEDDICT; 

} 

j ava. util . zip. Inflater . inflate (THIS ,  B,  OFF,  LEN)  { 

THIS  j  ava. util . zip. Inflater . strm; 

VAL  =  choose; 

B  java. lang. Obj ect#intarrayelement  :=  VAL; 

NEWLEN  =  choose; 

THIS  j ava. util . zip. Inflater . len  :=  NEWLEN; 

NEWTOTALIN  =  choose; 

THIS  j ava. util . zip. Inflater#totalIn  :=  NEWTOTALIN; 
NEWTOTALOUT  =  choose; 

THIS  j ava. util . zip. Inf later#totalOut  :=  NEWTOTALOUT; 
NEWOFF  =  choose; 

THIS  j ava. util . zip. Inflater . of f  :=  NEWOFF; 

NEWFINISHED  =  choose; 

THIS  j ava. util . zip. Inflater . finished  :=  NEWFINISHED; 
NEWNEEDDICT  =  choose; 

THIS  j ava. util . zip. Inflater . needsDictionary  := 
NEWNEEDDICT; 

EXN  =  new  j ava. util . zip. DataFormatException; 

STR  =  _stringconst ( ) ; 

j  ava. util . zip. DataFormatException. <init> (EXN) ; 
j ava. util . zip. DataFormatException. <init> (EXN,  STR) ; 
throw  =  choose  EXN; 


j  ava. util . zip. Inflater . getAdler (THIS )  { 
THIS  j  ava. util . zip. Inflater . strm; 

return  =  choose; 


j  ava. util .zip. Inflater . get Total In (THIS )  { 

THIS  j  ava. util . zip. Inflater . strm; 

return  =  THIS  j ava. util . zip. Inflater#totalIn; 


j  ava. util . zip. Inflater . getTotalOut (THIS )  { 

THIS  j  ava. util . zip. Inflater . strm; 

return  =  THIS  j ava. util . zip. Inf later#totalOut; 


j  ava. util .zip. Inflater. reset (THIS)  { 

THIS  j  ava. util . zip. Inflater . strm; 

NEWTOTALIN  =  choose; 

THIS  j ava. util . zip. Inflater#totalIn  :=  NEWTOTALIN; 
NEWTOTALOUT  =  choose; 

THIS  j ava. util . zip. Inf later#totalOut  :=  NEWTOTALOUT; 
NEWFINISHED  =  choose; 

THIS  j ava. util . zip. Inflater . finished  :=  NEWFINISHED; 
NEWNEEDDICT  =  choose; 
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THIS  j ava. util . zip. Inflater . needsDictionary  := 
NEWNEEDDICT; 

} 

j  ava. util .zip. Inflater . end (THIS )  { 

THIS  j  ava. util . zip. Inflater . strm; 

} 

j ava. util . zip. Inflater . init (THIS ,  NOWRAP)  { 

STRM  =  choose; 

THIS  java.util.zip. Inflater. strm  :=  STRM; 
j  ava. util . zip. Inflater . reset (THIS )  ; 


/*  j ava. util . zip. Deflater  */ 

accessDeflater (THIS )  { 

THIS  j  ava. util . zip. Deflater . setParams ; 

THIS  j  ava. util . zip. Deflater . strm; 

THIS  j  ava. util . zip. Deflater . finish; 

THIS  j  ava. util . zip. Deflater . level; 

THIS  j  ava. util . zip. Deflater . strategy; 

FALSE  =  choose; 

THIS  j ava. util . zip. Deflater . setParams  :=  FALSE; 


j ava. util . zip. Deflater . setDictionary (THIS ,  B,  OFF,  LEN)  { 
accessDeflater (THIS) ; 

} 

j ava. util . zip. Deflater . deflate (THIS,  B,  OFF,  LEN)  { 
accessDeflater (THIS) ; 

VAL  =  choose; 

B  j ava. lang. Obj ect#int array element  :=  VAL; 

NEWLEN  =  choose; 

THIS  j ava. util . zip. Deflater . len  :=  NEWLEN; 

NEWTOTALIN  =  choose; 

THIS  j ava. util . zip. Deflater#totalIn  :=  NEWTOTALIN; 
NEWTOTALOUT  =  choose; 

THIS  j ava. util . zip. Deflater#totalOut  :=  NEWTOTALOUT; 
NEWOFF  =  choose; 

THIS  j ava. util . zip. Deflater . of f  :=  NEWOFF; 

NEWFINISHED  =  choose; 

THIS  j ava. util . zip. Deflater . finished  :=  NEWFINISHED; 
return  =  choose; 


j  ava. util . zip. Deflater . getAdler (THIS )  { 
accessDeflater (THIS) ; 

return  =  choose; 


j  ava. util . zip. Deflater . get Total In (THIS )  { 
accessDeflater (THIS) ; 

return  =  THIS  j ava. util . zip. Deflater#totalIn; 


j  ava. util . zip. Deflater . getTotalOut (THIS )  { 
accessDeflater (THIS) ; 

return  =  THIS  j ava. util . zip. Deflater#totalOut; 


j  ava. util . zip. Deflater . reset (THIS )  { 
accessDeflater (THIS) ; 

NEWTOTALIN  =  choose; 

THIS  j ava. util . zip. Deflater#totalIn  :=  NEWTOTALIN; 
NEWTOTALOUT  =  choose; 

THIS  j ava. util . zip. Deflater#totalOut  :=  NEWTOTALOUT; 
NEWFINISHED  =  choose; 

THIS  j ava. util . zip. Deflater . finished  :=  NEWFINISHED; 


j  ava. util . zip. Deflater . end (THIS )  { 
accessDeflater (THIS) § 

} 

j ava. util . zip. Deflater . init (THIS ,  NOWRAP)  { 
STRM  =  choose; 

THIS  java.util.zip. Deflater. strm  :=  STRM; 
j  ava. util . zip. Deflater . reset (THIS ) ; 


/*  j ava. util . zip. CRC32  */ 

j ava. util . zip. CRC32 . update (THIS ,  B,  OFF,  LEN)  { 


VAL  =  choose; 

THIS  j ava. util . zip. CRC32 . crc  :=  VAL; 

B  j  ava . lang . Obj  ect#intarrayelement ; 

} 

j ava. util . zip. CRC32 . updatel (THIS ,  B)  { 

VAL  =  choose; 

THIS  j ava. util . zip. CRC32 . crc  :=  VAL; 

} 

/*  j ava. awt . image . ColorModel  */ 

j  ava. awt . image . ColorModel . deletepData (THIS )  { 

} 

/*  sun. awt .windows .WTool kit  */ 

sun. awt .windows .WToolkit . init (THIS ,  EVENTTHREAD  /* 
j ava. lang. Thread  */)  { 

} 

sun. awt .windows .WToolkit . event Loop (THIS )  { 

T:  goto  EA,  EB,  EC,  ED,  EE,  EF,  EG,  EH,  El,  EJ,  EK,  EL, 

EM,  EN,  EY,  EZ,  EO,  El,  E2,  E3,  E4 ,  E5,  E6,  EXIT; 

EA:  TARGET  =  sun . awt . windows . WComponent Peer #allPeers ; 
ACTION  =  choose; 

sun . awt . windows . WChoicePeer . handleAction ( TARGET , 
ACTION) ; 

goto  T; 

EB:  TARGET  =  sun . awt . windows . WComponent Peer #allPeers ; 
sun . awt . windows . WButtonPeer . handleAction ( TARGET ) ; 
goto  T; 

EC:  TARGET  =  sun. awt .windows .WComponent Peer #allPeers ; 

AMT  =  choose; 

sun. awt .windows .WScrollbar Peer . lineUp (TARGET,  AMT) ; 
goto  T; 

ED:  TARGET  =  sun . awt . windows . WComponent Peer #allPeers ; 

AMT  =  choose; 

sun. awt .windows .WScrollbar Peer . lineDown (TARGET,  AMT) 
goto  T; 

EE:  TARGET  =  sun . awt . windows . WComponent Peer #allPeers ; 

AMT  =  choose; 

sun. awt .windows .WScrollbar Peer . pageUp (TARGET,  AMT) ; 
goto  T; 

EF:  TARGET  =  sun . awt . windows . WComponentPeer#allPeers ; 

AMT  =  choose; 

sun. awt .windows .WScrollbar Peer . pageDown (TARGET,  AMT) 
goto  T; 

EG:  TARGET  =  sun . awt . windows . WComponent Peer #allPeers ; 

AMT  =  choose; 

sun. awt .windows . WScrollbarPeer . dragBegin (TARGET, 

AMT)  ; 

goto  T; 

EH:  TARGET  =  sun . awt . windows . WComponentPeer#allPeers ; 

AMT  =  choose; 

sun. awt .windows .WScrollbarPeer . dragAbsolute (TARGET, 
AMT)  ; 

goto  T; 

El:  TARGET  =  sun. awt .windows .WComponent Peer #allPeers ; 

AMT  =  choose; 

sun. awt .windows .WScrollbarPeer . dr agEnd (TARGET,  AMT) ; 
goto  T; 

EJ:  TARGET  =  sun. awt .windows . WMenuItemPeer#menuItemPeers 
CODE  =  choose; 

sun . awt . windows . WMenuItemPeer . handleAction ( TARGET , 
CODE ) ; 

goto  T; 

EK:  TARGET  =  sun . awt . windows . WComponent Peer #allPeers ; 

sun. awt .windows . WFileDialogPeer . handleCancel (TARGET) 
goto  T; 

EL:  TARGET  =  sun . awt . windows . WComponentPeer#allPeers ; 

STR  =  makeString () ; 

sun. awt .windows .WFileDialogPeer . handleSelected (TARGET, 
STR)  ; 

goto  T; 

EM:  TARGET  =  sun . awt . windows . WComponent Peer #allPeers ; 
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sun. awt .windows . WWindowPeer . postFocusOnActivate (TARGET) ; 
goto  T; 

EN:  TARGET  =  sun . awt . windows . WComponentPeer #allPeers ; 

sun . awt . windows . WTextFieldPeer . handleAction ( TARGET ) ; 
goto  T; 

EY :  TARGET  =  sun . awt . windows . WComponentPeer #allPeers ; 

X  =  choose; 

Y  =  choose; 

W  =  choose; 

H  =  choose; 

sun . awt . windows . WComponentPeer . handle Repaint ( TARGET , 
X,  Y,  W,  H)  ; 
goto  T; 

EZ :  TARGET  =  sun. awt .windows . WComponentPeer#allPeers ; 

X  =  choose; 

Y  =  choose; 

W  =  choose; 

H  =  choose; 

sun. awt .windows .WComponentPeer . handle Expose (TARGET, 

X,  Y,  W,  H)  ; 

goto  T; 

EO:  TARGET  =  sun. awt .windows .WComponentPeer #allPeers ; 

X  =  choose; 

Y  =  choose; 

W  =  choose; 

H  =  choose; 

sun. awt .windows .WComponentPeer . handle Paint (TARGET,  X, 

Y,  W,  H)  ; 

goto  T; 

El:  CLIPBOARD  =  sun. awt .windows .WToolkit #theClipboard; 

sun. awt .windows . WClipboard. lostSelectionOwnership (CLIPBOA 
RD)  ; 

goto  T; 

E2 :  EVT  =  new  j  ava . awt . event . Key Event ; 

TARGET  =  sun. awt .windows .WComponentPeer #allPeers ; 
TARGET  =  TARGET  sun . awt . windows .WObj ect Peer . target ; 

ID  =  choose; 

WHEN  =  choose; 

MODS  =  choose; 

KEYCODE  =  choose; 

KEYCHAR  =  choose; 

j ava. awt . event . KeyEvent . <init> (EVT,  TARGET,  ID,  WHEN, 
MODS,  KEYCODE,  KEYCHAR); 
goto  POST; 

E3 :  EVT  =  new  j ava. awt . event .Mouse Event ; 

TARGET  =  sun. awt .windows .WComponentPeer #allPeers ; 
TARGET  =  TARGET  sun . awt . windows .WObj ect Peer . target ; 

ID  =  choose; 

WHEN  =  choose; 

MODS  =  choose; 

X  =  choose; 

Y  =  choose; 

CLICKS  =  choose; 

POPUP  =  choose; 

j ava. awt . event .MouseEvent . <init> (EVT,  TARGET,  ID, 
WHEN,  MODS,  X,  Y,  CLICKS,  POPUP); 
goto  POST; 

E4 :  EVT  =  new  j  ava . awt . event . WindowEvent ; 

TARGET  =  sun. awt .windows .WComponentPeer #allPeers ; 
TARGET  =  TARGET  sun . awt . windows .WObj ect Peer . target ; 

ID  =  choose; 

j ava. awt . event .WindowEvent . <init> (EVT,  TARGET,  ID); 
goto  POST; 

E5 :  TARGET  =  sun. awt .windows .WComponentPeer #allPeers ; 

sun . awt . windows . WText Component Peer . value Changed ( TARGET ) ; 
goto  T; 

E6:  EVT  =  new  j ava. awt . event . Focus Event ; 

TARGET  =  sun. awt .windows .WComponentPeer #allPeers ; 
TARGET  =  TARGET  sun . awt . windows .WObj ect Peer . target ; 

ID  =  choose; 

ISTMP  =  choose; 

j ava. awt . event . Focus Event . <init> (EVT,  TARGET,  ID, 
ISTMP) ; 

goto  POST; 

POST: 

sun . awt . windows . WToolkit . post Event ( EVT ) ; 
goto  T; 


EXIT: 

choose; 

} 

sun. awt .windows .WToolkit . getComboHeightOf fset ( )  { 
return  =  choose;  /*  int  */ 

} 

sun . awt . windows . WToolkit . makeColorModel ( )  { 

BITS  =  choose; 

RMASK  =  choose; 

GMASK  =  choose; 

BMASK  =  choose; 

AMASK  =  choose; 

Ml  =  new  j ava. awt . image . DirectColorModel; 
j  ava. awt . image . DirectColorModel . <init> (Ml ,  BITS , 
RMASK,  GMASK,  BMASK,  AMASK) ; 

SIZE  =  choose; 

CMAP  =  makeByteArray ( ) ; 

START  =  choose; 

HAS ALPHA  =  choose; 

TRANS  =  choose; 

M2  =  new  java. awt . image. IndexColorModel; 
j ava. awt . image . IndexColorModel . <init> (M2 ,  BITS,  SIZE, 
CMAP,  START,  HASALPHA,  TRANS)  " ( II [BIZI ) V" ; 

return  =  choose  Ml,  M2; 

} 

sun. awt .windows .WToolkit . getScreenResolution (THIS )  { 
return  =  choose;  /*  int  */ 

} 

sun. awt .windows .WToolkit . getScreenWidth (THIS )  { 
return  =  choose;  /*  int  */ 

} 

sun. awt .windows .WToolkit . getScreenHeight (THIS )  { 
return  =  choose;  /*  int  */ 

} 

sun. awt .windows .WToolkit . sync (THIS )  { 

} 

sun. awt .windows .WToolkit . beep (THIS )  { 

} 

sun. awt .windows .WToolkit . loadSystemColors (THIS , 
COLORARRAY  /*  int [ ]  */)  { 

COLORARRAY  j  ava. lang. Obj  ect#arraylength; 

VAL  =  choose; 

COLORARRAY  j ava . lang . Obj ect #int array element  :=  VAL; 

} 

/*  sun. awt .windows .WObj ect Peer  */ 

sun. awt .windows .WObj  ectPeer . initIDs ( )  { 

} 

/*  sun. awt .windows .WComponentPeer  */ 

makePoint(X,  Y)  { 

X  =  choose; 

Y  =  choose; 

P  =  new  java.awt.Point; 
j ava. awt . Point . <init> (P,  X,  Y) ; 
return  =  choose  P; 

} 

sun. awt .windows .WComponentPeer ._beginValidate (THIS )  { 

} 

sun. awt .windows .WComponentPeer . endValidate (THIS )  { 

} 

sun. awt .windows .WComponentPeer . start (THIS )  { 

X  =  choose; 

Y  =  choose; 

THIS  sun. awt .windows .WComponentPeer #X  :=  X; 

THIS  sun. awt .windows .WComponentPeer #Y  :=  Y; 
sun. awt .windows .WComponentPeer #allPeers  :=  THIS; 

} 

sun. awt .windows .WComponentPeer ._dispose (THIS )  { 

} 

sun. awt .windows .WComponentPeer . disable (THIS )  { 

} 

sun. awt .windows .WComponentPeer . enable (THIS )  { 
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sun. awt .windows . WComponentPeer . hide (THIS )  { 

} 

sun. awt .windows .WComponentPeer . show ( THIS )  { 

} 

sun. awt .windows .WComponentPeer . reshape (THIS ,  X,  Y,  W,  H) 

{ 

THIS  sun. awt .windows .WComponentPeer #X  :=  X; 

THIS  sun. awt .windows .WComponentPeer #Y  :=  Y; 

} 

sun. awt .windows .WComponentPeer . getLocationOnScreen (THIS ) 

{ 

X  =  THIS  sun. awt .windows .WComponentPeer #X; 

Y  =  THIS  sun. awt .windows .WComponentPeer #Y; 

P  =  new  j ava. awt . Point ; 
j ava. awt . Point . <init> (P,  X,  Y) ; 
return  =  choose  P; 

} 

sun. awt .windows .WComponentPeer . set Curs or (THIS ,  CURSOR)  { 

} 

sun. awt .windows .WComponentPeer . set Font (THIS ,  FONT)  { 

} 

sun. awt .windows .WComponentPeer . setZOrderPosition (THIS , 
COMPONENT)  { 

} 

sun. awt .windows .WComponentPeer ._setBackground (THIS , 

COLOR)  { 

} 

sun. awt .windows . WComponentPeer ._setForeground (THIS , 

COLOR)  { 

} 

sun. awt .windows .WComponentPeer . addNativeDropTarget (THIS ) 

{ 

} 

sun . awt . windows . WComponentPeer . removeNativeDropTarget ( THI 
S)  { 

} 

sun. awt .windows .WComponentPeer . nativeHandleEvent (THIS , 
EVENT)  { 

} 

sun. awt .windows .WComponentPeer . reguestFocus (THIS )  { 

} 

/*  sun. awt .windows .WWindowPeer  */ 

sun. awt .windows .WWindowPeer . create (THIS ,  PARENT)  { 

PDATA  =  choose; 

THIS  sun. awt .windows .WObj ect Peer . pDat a  :=  PDATA; 

} 

sun. awt .windows .WWindowPeer ._set Resizable (THIS ,  BOOL)  { 

} 

sun. awt .windows .WWindowPeer ._set Title (THIS ,  STR)  { 

} 

sun. awt .windows .WWindowPeer . toBack (THIS )  { 

} 

sun. awt .windows .WWindowPeer . toFront (THIS )  { 

} 

sun. awt .windows .WWindowPeer . update Insets (THIS ,  INSETS)  { 

} 

sun. awt .windows .WWindowPeer . getContainerElement (THIS , 
CONTAINER,  INDEX)  { 

return  =  j ava. awt . Container . getComponent (CONTAINER, 
INDEX) ; 

} 

/*  sun. awt .windows .WFramePeer  */ 

sun. awt .windows .WFramePeer . create (THIS ,  PARENT)  { 

PDATA  =  choose; 

THIS  sun. awt .windows .WObj ect Peer . pDat a  :=  PDATA; 

STATE  =  choose; 

THIS  sun. awt .windows . WFramePeer#state  :=  STATE; 

} 


sun. awt .windows .WFramePeer . getState (THIS )  { 

return  =  THIS  sun. awt .windows . WFramePeer#state; 

} 

sun. awt .windows .WFramePeer ._set I conlmage (THIS ,  REP)  { 

} 

sun. awt .windows .WFramePeer . getSysIconHeight (THIS )  { 
return  =  choose; 

} 

sun. awt .windows .WFramePeer . getSysIconWidth (THIS )  { 
return  =  choose; 

} 

sun. awt .windows .WFramePeer . pSetIMMOpt ion (THIS ,  STR)  { 

} 

sun. awt .windows .WFramePeer . reshape (THIS ,  X,  Y,  W,  H)  { 
sun. awt .windows .WComponentPeer . reshape (THIS ,  X,  Y,  W, 

H)  ; 

} 

sun. awt .windows .WFramePeer . setlconlmageFromlntRasterData ( 
THIS,  BITS,  DATAWIDTH,  PIXHEIGHT,  PIXWIDTH)  { 

} 

sun. awt .windows .WFramePeer . setMenuBarO (THIS ,  MENUBAR)  { 

} 

sun. awt .windows .WFramePeer . setState (THIS ,  STATE)  { 

THIS  sun. awt .windows . WFramePeer#state  :=  STATE; 

} 

/*  sun. awt .windows .WDialogPeer  */ 

sun. awt .windows .WDialogPeer . create (THIS ,  PARENT)  { 

PDATA  =  choose; 

THIS  sun. awt .windows .WObj ect Peer . pDat a  :=  PDATA; 

} 

sun. awt .windows .WDialogPeer . showModal (THIS )  { 

} 

sun. awt .windows .WDialogPeer . endModal (THIS )  { 

} 

sun. awt .windows .WDialogPeer . pSetIMMOpt ion (THIS ,  STR)  { 

} 

/*  sun. awt .windows . WFileDialogPeer  */ 

sun. awt .windows .WFileDialogPeer . initIDs ( )  { 

} 

sun. awt .windows .WFileDialogPeer . show (THIS )  { 

} 

sun. awt .windows .WFileDialogPeer . targetSetDirectory_NoClie 
ntCode (THIS ,  DIALOG,  STR)  { 

DIALOG  j ava. awt . FileDialog. file  :=  STR; 

} 

sun. awt .windows .WFileDialogPeer . targetSetFile_NoClientCod 
e (THIS ,  DIALOG,  STR)  { 

DIALOG  j ava. awt . FileDialog. dir  :=  STR; 

} 

/*  sun. awt .windows .WCanvas Peer  */ 

sun. awt .windows .WChoicePeer . create (THIS ,  PARENT)  { 

PDATA  =  choose; 

THIS  sun. awt .windows .WObj ect Peer . pDat a  :=  PDATA; 

} 

sun. awt .windows .WChoicePeer . addltem (THIS ,  STR,  INDEX)  { 

} 

sun. awt .windows .WChoicePeer . remove (THIS ,  INDEX)  { 

} 

sun. awt .windows .WChoicePeer . select (THIS ,  INDEX)  { 

} 

sun. awt .windows .WChoicePeer . reshape (THIS ,  X,  Y,  W,  H)  { 
sun. awt .windows .WComponentPeer . reshape (THIS ,  X,  Y,  W, 

H)  ; 

} 

/*  sun. awt .windows .WCanvas Peer  */ 
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sun. awt .windows .WCanvas Peer . create (THIS ,  PARENT)  { 

PDATA  =  choose; 

THIS  sun. awt .windows .WObj ect Peer . pDat a  :=  PDATA; 

} 

/*  sun. awt .windows . WMenuItemPeer  */ 

sun. awt .windows . WMenuItemPeer . create (THIS ,  MENU)  { 

PDATA  =  choose; 

THIS  sun. awt .windows .WObj ect Peer . pDat a  :=  PDATA; 
sun. awt .windows .WMenuItemPeer #menuItemPeers  :=  THIS; 

} 

sun. awt .windows .WMenuItemPeer ._dispose (THIS )  { 

} 

sun. awt .windows .WMenuItemPeer ._set Label (THIS ,  STR)  { 

} 

sun. awt .windows .WMenuItemPeer . enable (THIS ,  BOOL)  { 

} 

sun. awt .windows .WMenuItemPeer . initIDs ( )  { 

} 

/*  sun. awt .windows .WMenuPeer  */ 

sun. awt .windows .WMenuPeer . cr eat eMenu (THIS ,  MENUBAR)  { 
PDATA  =  choose; 

THIS  sun. awt .windows .WObj ect Peer . pDat a  :=  PDATA; 

} 

sun. awt .windows .WMenuPeer . cr eat eSubMenu (THIS ,  MENU)  { 
PDATA  =  choose; 

THIS  sun. awt .windows .WObj ect Peer . pDat a  :=  PDATA; 

} 

sun. awt .windows .WMenuPeer . addSeparator (THIS )  { 

} 

sun. awt .windows .WMenuPeer . del I tern (THIS ,  INDEX)  { 

} 

/*  sun. awt .windows .WMenuBar Peer  */ 

sun. awt .windows . WMenuBarPeer . create (THIS ,  FRAME)  { 

PDATA  =  choose; 

THIS  sun. awt .windows .WObj ect Peer . pDat a  :=  PDATA; 

} 

sun. awt .windows .WMenuBarPeer . addMenu (THIS ,  MENU)  { 

} 

sun. awt .windows .WMenuBarPeer . delMenu (THIS ,  INDEX)  { 

} 

/*  sun. awt .windows . WCheckboxMenuItemPeer  */ 

sun. awt .windows .WCheckboxMenuItemPeer . setState (THIS , 

BOOL)  { 

} 

/*  sun. awt .windows .WT ext Component Peer  */ 

sun. awt .windows . WTextComponentPeer . enableEditing (THIS , 
BOOL)  { 

} 

sun. awt .windows .WTextComponentPeer . getSelectionStart (THIS 

)  { 

return  =  THIS 

sun. awt .windows . WText Component Peer #select from; 

} 

sun. awt .windows .WTextComponentPeer . gets elect ionEnd (THIS ) 

{ 

return  =  THIS 

sun. awt .windows .WText Component Peer #selectto; 

} 

sun. awt .windows . WTextComponentPeer . select (THIS ,  FROM,  TO) 

{ 

THIS  sun. awt .windows .WText Component Peer #s elect from  := 
FROM; 

THIS  sun. awt .windows .WText Component Peer #selectto  := 

TO; 

} 

sun. awt .windows .WTextComponentPeer . get Text (THIS )  { 
return  =  THIS 

sun . awt . windows . WText Component Peer #text ; 

} 


sun. awt .windows .WText Component Peer . set Text (THIS ,  STR)  { 
THIS  sun. awt .windows .WText Component Peer #t ext  :=  STR; 

} 

sun. awt .windows .WTextComponentPeer . initIDs ( )  { 

} 

/*  sun. awt .windows .WText AreaPeer  */ 

sun. awt .windows . WTextAreaPeer . create (THIS ,  PARENT)  { 
PDATA  =  choose; 

THIS  sun. awt .windows .WObj ect Peer . pDat a  :=  PDATA; 

} 

sun. awt .windows .WTextAreaPeer . insertText (THIS ,  STR,  POS) 

{ 

TEXT  =  THIS  sun. awt .windows .WText Component Peer #t ext ; 
NEWTEXT  =  mungeStrings (TEXT,  STR); 

THIS  sun. awt .windows .WText Component Peer #t ext  := 
NEWTEXT; 

} 

sun. awt .windows .WTextAreaPeer . replaceText (THIS ,  STR, 

FROM,  TO)  { 

TEXT  =  THIS  sun. awt .windows .WText Component Peer #t ext ; 
NEWTEXT  =  mungeStrings (TEXT,  STR); 

THIS  sun. awt .windows .WText Component Peer #t ext  := 
NEWTEXT; 

} 

/*  sun. awt .windows . WTextFieldPeer  */ 

sun. awt .windows . WTextFieldPeer . create (THIS ,  PARENT)  { 
PDATA  =  choose; 

THIS  sun. awt .windows .WObj ect Peer . pDat a  :=  PDATA; 

} 

sun. awt .windows .WTextFieldPeer . setEchoCharacter (THIS ,  CH) 

{ 

} 

/*  sun. awt .windows .WLabelPeer  */ 

sun. awt .windows .WLabelPeer . create (THIS ,  PARENT)  { 

PDATA  =  choose; 

THIS  sun. awt .windows .WObj ect Peer . pDat a  :=  PDATA; 

} 

sun. awt .windows .WLabelPeer . set Alignment (THIS ,  ALIGN)  { 

} 

sun. awt .windows .WLabelPeer . setText (THIS ,  STR)  { 

} 

/*  sun. awt .windows . WCheckboxPeer  */ 

sun. awt .windows . WCheckboxPeer . create (THIS ,  PARENT)  { 
PDATA  =  choose; 

THIS  sun. awt .windows .WObj ect Peer . pDat a  :=  PDATA; 

} 

sun. awt .windows .WCheckboxPeer . setCheckboxGroup (THIS , 
GROUP )  { 

} 

sun. awt .windows .WCheckboxPeer . setLabel (THIS ,  STR)  { 

} 

sun. awt .windows .WCheckboxPeer . setState (THIS ,  BOOL)  { 

} 

/*  sun. awt .windows .WButtonPeer  */ 

sun. awt .windows .WButtonPeer . create (THIS ,  PARENT)  { 

PDATA  =  choose; 

THIS  sun. awt .windows .WObj ect Peer . pDat a  :=  PDATA; 

} 

sun. awt .windows .WButtonPeer . initIDs ( )  { 

} 

sun. awt .windows .WButtonPeer . setLabel (THIS ,  STR)  { 

} 

/*  sun. awt .windows .WListPeer  */ 

sun. awt .windows .WListPeer . create (THIS ,  PARENT)  { 

PDATA  =  choose; 

THIS  sun. awt .windows .WObj ect Peer . pDat a  :=  PDATA; 
MAXWIDTH  =  choose; 

THIS  sun. awt .windows .WListPeer #maxwidth  :=  MAXWIDTH; 
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} 

sun. awt .windows .WListPeer ._addltem (THIS ,  STR,  INDEX, 
WIDTH)  { 

goto  Y,  N; 

Y:  THIS  sun. awt .windows .WListPeer #maxwidth  :=  WIDTH; 

N:  choose; 

} 

sun. awt .windows .WListPeer . addltemO (THIS ,  STR,  INDEX, 
WIDTH)  { 

goto  Y,  N; 

Y:  THIS  sun. awt .windows .WListPeer #maxwidth  :=  WIDTH; 

N:  choose; 

} 

sun. awt .windows .WListPeer . delltems (THIS ,  FROM,  TO)  { 

} 

sun. awt .windows .WListPeer . setMultipleSelections (THIS , 
BOOL)  { 

} 

sun. awt .windows .WListPeer . select (THIS ,  INDEX)  { 

} 

sun. awt .windows .WListPeer . deselect (THIS ,  INDEX)  { 

} 

sun. awt .windows .WListPeer . isSelected (THIS ,  INDEX)  { 
return  =  choose; 

} 

sun. awt .windows .WListPeer .makeVisible (THIS ,  INDEX)  { 

} 

sun. awt .windows .WListPeer . updateMaxItemWidth (THIS )  { 

} 

sun. awt .windows .WListPeer . getMaxWidth (THIS ,  WIDTH)  { 
return  =  THIS  sun. awt .windows .WListPeer #maxwidth; 

} 

/*  sun. awt .windows .WClipboard  */ 

sun. awt .windows .WClipboard. getClipboardText (THIS )  { 
goto  N,  R; 

N:  STR  =  makeString () ; 

THIS  sun. awt .windows . WClipboard#t ext  :=  STR; 

R:  return  =  THIS  sun. awt .windows . WClipboard#t ext ; 

} 

sun . awt . windows . WClipboard . init ( )  { 

} 

sun. awt .windows .WClipboard. setClipboardObj ect (THIS ,  OBJ) 

{ 

sun. awt .windows . WToolkit#theClipboard  :=  THIS; 

THIS  sun. awt .windows . WClipboard#t ext  :=  OBJ; 

} 

sun. awt .windows .WClipboard. setClipboardText (THIS ,  STRSEL) 

{ 

sun. awt .windows . WToolkit#theClipboard  :=  THIS; 

DATA  =  STRSEL 

j  ava. awt . datatransfer . StringSelection. data; 

THIS  sun. awt .windows .WClipboard# text  :=  DATA; 

} 

/*  sun. awt .windows .WCol or  */ 

sun. awt .windows . WColor . getDefaultColor ( INDEX)  { 
return  =  choose; 

} 

/*  sun. awt .windows . WFontMetrics  */ 

sun. awt .windows .WFontMetrics . init IDs ( )  { 

} 

sun. awt .windows .WFontMetrics . init (THIS )  { 

INTS  =  makelntArray ( ) ; 

THIS  sun. awt .windows .WFontMetrics .widths  :=  INTS; 

V  =  choose; 

THIS  sun. awt .windows .WFontMetrics . ascent  :=  V; 

V  =  choose; 


THIS  sun. awt .windows .WFontMetrics . descent  :=  V; 

V  =  choose; 

THIS  sun. awt .windows .WFontMetrics . leading  :=  V; 

V  =  choose; 

THIS  sun. awt .windows .WFontMetrics . height  :=  V; 

V  =  choose; 

THIS  sun. awt .windows .WFontMetrics .maxAs cent  :=  V; 

V  =  choose; 

THIS  sun. awt .windows .WFontMetrics .maxDes cent  :=  V; 

V  =  choose; 

THIS  sun. awt .windows .WFontMetrics .maxHeight  :=  V; 

V  =  choose; 

THIS  sun. awt .windows .WFontMetrics .maxAdvance  :=  V; 

} 

sun. awt .windows .WFontMetrics . bytesWidth (THIS ,  BYTES , 
INDEX,  LEN)  { 

return  =  choose; 

} 

sun. awt .windows .WFontMetrics . chars Width (THIS ,  CHARS , 
INDEX,  LEN)  { 

return  =  choose; 

} 

sun. awt .windows .WFontMetrics . stringWidth (THIS ,  STR)  { 
return  =  choose; 

} 

sun. awt .windows .WFontMetrics . needs Conversion (FONT, 
FONTDESC)  { 

return  =  choose; 

} 

sun. awt .windows .WFontMetrics . getMFCharSegmentWidth (THIS , 
FONT,  FONTDESC,  BOOL,  CHARS,  FROM,  TO,  SEGS,  LEN)  { 
return  =  choose; 

} 

/*  sun. awt .windows .WDe fault Font Char set  */ 

sun. awt .windows .WDe fault Font Char set . init IDs ( )  { 

} 

sun. awt .windows .WDe fault Font Char set . canConvert (THIS ,  CH) 

{ 

return  =  choose; 

} 

/*  sun. awt .windows .WPrint Job  */ 

sun. awt .windows .WPrint Job. init IDs ( )  { 

} 

sun. awt .windows .WPrint Job. newPage (THIS )  { 

} 

sun. awt .windows .WPrint Job. flushPagelmpl (THIS )  { 

} 

sun. awt .windows .WPrint Job. endlmpl (THIS )  { 

} 

/*  sun. awt .windows .WGraphics  */ 

sun. awt .windows .WGraphics . init IDs ( )  { 

} 

sun . awt . windows . WGraphics . checkNoDDraw ( )  { 
return  =  choose; 

} 

sun. awt .windows .WGraphics . createFromComponent (THIS ,  COMP) 

{ 

PDATA  =  choose; 

THIS  sun. awt .windows .WGraphics . pDat a  :=  PDATA; 

} 

sun. awt .windows .WGraphics . createFromGraphics (THIS ,  G)  { 
PDATA  =  choose; 

THIS  sun. awt .windows .WGraphics . pDat a  :=  PDATA; 

} 

sun. awt .windows .WGraphics . cr eat eFromHDC (THIS ,  HDC)  { 

PDATA  =  choose; 

THIS  sun. awt .windows .WGraphics . pDat a  :=  PDATA; 

} 

sun. awt .windows .WGraphics . createFromPrint Job (THIS ,  JOB)  { 
PDATA  =  choose; 

THIS  sun. awt .windows .WGraphics . pDat a  :=  PDATA; 

} 
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sun. awt .windows . WGraphics . dispose Impl (THIS )  { 

} 

sun.  awt  .windows  .WGraphics  .  W32LockViewR.es  our  ces  (THIS , 

DATA,  VIEWX,  VIEWY,  VIEWW,  VIEWH,  LOCKMETHOD)  { 
return  =  choose; 

} 

sun.  awt  .windows  .WGraphics  .  W32UnLockViewR.es  our  ces  (THIS , 
DATA)  { 

return  =  choose; 

} 

sun. awt .windows .WGraphics . getClipBounds (THIS )  { 

X  =  choose; 

Y  =  choose; 

W  =  choose; 

H  =  choose; 

RECT  =  new  j ava. awt . Rectangle; 
j ava. awt . Rectangle . <init> (RECT,  X,  Y,  W,  H) ; 
return  =  choose  RECT; 

} 

sun. awt .windows .WGraphics . changeClip (THIS ,  X,  Y,  W,  H, 
BOOL)  { 

} 

sun. awt .windows .WGraphics . removeClip (THIS )  { 

} 

sun. awt .windows .WGraphics . clearRect (THIS ,  X,  Y,  W,  H)  { 

} 

sun. awt .windows .WGraphics . dr awRect (THIS ,  X,  Y,  W,  H)  { 

} 

sun. awt .windows .WGraphics . fillRect (THIS ,  X,  Y,  W,  H)  { 

} 

sun. awt .windows .WGraphics . dr awLine (THIS ,  X,  Y,  X2,  Y2)  { 

} 

sun. awt .windows .WGraphics . copyArea (THIS ,  X,  Y,  W,  H,  DX, 
DY)  { 

} 

sun. awt .windows .WGraphics . dr awArc (THIS ,  X,  Y,  W,  H,  FROM, 
TO)  { 

} 

sun. awt .windows .WGraphics . fillArc (THIS ,  X,  Y,  W,  H,  FROM, 
TO)  { 

} 

sun. awt .windows .WGraphics . dr awOval (THIS ,  X,  Y,  W,  H)  { 

} 

sun. awt .windows .WGraphics . fillOval (THIS ,  X,  Y,  W,  H)  { 

} 

sun. awt .windows .WGraphics . dr awPolygon (THIS ,  XS,  YS, 

COUNT )  { 

} 

sun. awt .windows .WGraphics . fillPolygon (THIS ,  XS,  YS, 

COUNT )  { 

} 

sun. awt .windows .WGraphics . dr awPoly line (THIS ,  XS,  YS, 
COUNT )  { 

} 

sun. awt .windows .WGraphics . dr awRoundRect (THIS ,  X,  Y,  W,  H, 
RX,  RY)  { 

} 

sun. awt .windows .WGraphics . fillRoundRect (THIS ,  X,  Y,  W,  H, 
RX,  RY)  { 

} 

sun. awt .windows .WGraphics . print (THIS,  COMPONENT)  { 

} 

sun. awt .windows .WGraphics . devClearRect (THIS ,  X,  Y,  W,  H) 

{ 

} 

sun. awt .windows .WGraphics . devCopyArea (THIS ,  X,  Y,  W,  H, 
DX,  DY)  { 

} 


sun. awt .windows .WGraphics . devDr awArc (THIS ,  X,  Y,  W,  H, 
FROM,  TO)  { 

} 

sun. awt .windows .WGraphics . devFillArc (THIS ,  X,  Y,  W,  H, 
FROM,  TO)  { 

} 

sun. awt .windows .WGraphics . devDr awLine (THIS ,  X,  Y,  X2,  Y2) 

{ 

} 

sun. awt .windows .WGraphics . devDr awOval (THIS ,  X,  Y,  W,  H)  { 

} 

sun. awt .windows .WGraphics . devFillOval (THIS ,  X,  Y,  W,  H)  { 

} 

sun. awt .windows .WGraphics . devDr awPolygon (THIS ,  XS,  YS, 
COUNT )  { 

} 

sun. awt .windows .WGraphics . devFillPolygon (THIS ,  XS,  YS, 
COUNT )  { 

} 

sun. awt .windows .WGraphics . devDr awPolyline (THIS ,  XS,  YS, 
COUNT )  { 

} 

sun. awt .windows .WGraphics . devDr awRect (THIS ,  X,  Y,  W,  H)  { 

} 

sun. awt .windows .WGraphics . devFillRect (THIS ,  X,  Y,  W,  H)  { 

} 

sun. awt .windows .WGraphics . devDr awRoundRect (THIS ,  X,  Y,  W, 
H,  RX,  RY)  { 

} 

sun. awt .windows .WGraphics . devFillRoundRect (THIS ,  X,  Y,  W, 
H,  RX,  RY)  { 

} 

sun. awt .windows .WGraphics . devFillSpans (THIS ,  ITERATOR, 
LONG)  { 

} 

sun. awt .windows .WGraphics . devPrint (THIS ,  COMPONENT)  { 

} 

sun. awt .windows .WGraphics . draws FChars (THIS ,  CHARS,  FROM, 
TO,  X,  Y)  { 

} 

sun. awt .windows .WGraphics . dr awMFChars Segment (THIS ,  FONT, 
FONTDESC,  CHARS,  FROM,  TO,  X,  Y)  { 
return  =  choose; 

} 

sun. awt .windows .WGraphics . drawMFCharsConvertedSegment (THI 
S,  FONT,  FONTDESC,  BYTES,  LEN,  X,  Y)  { 
return  =  choose; 

} 

sun. awt .windows .WGraphics . dr awBytes (THIS ,  BYTES,  FROM, 

TO,  X,  Y)  { 

return  =  choose; 

} 

sun. awt .windows .WGraphics . drawBytesWidth (THIS ,  BYTES , 
FROM,  TO,  X,  Y)  { 

return  =  choose; 

} 

sun. awt .windows .WGraphics . dr awChars Width (THIS ,  CHARS , 
FROM,  TO,  X,  Y)  { 

return  =  choose; 

} 

sun. awt .windows .WGraphics . drawS tringWidth (THIS ,  STR,  X, 

Y)  { 

return  =  choose; 

} 

sun. awt .windows .WGraphics . pSet Font (THIS ,  FONT)  { 

} 

sun. awt .windows .WGraphics . pSet Foreground (THIS ,  COLOR)  { 

} 

sun. awt .windows .WGraphics . setPaintMode (THIS )  { 
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sun. awt .windows . WGraphics . pSetPaintMode (THIS )  { 

} 


sun. awt .windows .WGraphics . setXORMode (THIS ,  COLOR)  { 

} 

sun. awt .windows .WGraphics . pSetXORMode (THIS ,  COLOR)  { 

} 

sun. awt .windows .WGraphics . setOrigin (THIS ,  X,  Y)  { 

} 

sun. awt .windows .WGraphics . image Cr eat e (THIS ,  IMAGE)  { 

} 

/*  sun. awt . image . ImageRepresentation  */ 

sun. awt . image . ImageRepresentation. of fscreenlnit (THIS , 
COLOR)  { 

} 

sun. awt . image . ImageRepresentation. disposelmage (THIS )  { 

} 

convertPixel (CM,  DATA)  { 

PIXEL  =  DATA  j ava. lang. Obj ect#intarrayelement ; 
j ava. awt . image . Col orModel . get Alpha (CM,  PIXEL) ; 
j ava. awt . image . Col orModel . get Red (CM,  PIXEL) ; 
j ava. awt . image . ColorModel . getGreen (CM,  PIXEL) ; 
j ava. awt . image . ColorModel . getBlue (CM,  PIXEL) ; 

} 

sun. awt . image . ImageRepresentation. setBytePixels (THIS ,  X, 
Y,  W,  H,  CM,  BYTES,  OFF,  LEN)  { 
convertPixel (CM,  BYTES); 

} 


EX:  STR  =  _stringconst ( ) ; 

ERR  =  sun. awt . image . JPEGImageDecoder . error (STR) ; 

EXN2  =  catch  ( j ava. lang. Throwable)  ERR; 

throw  =  choose  EXN1,  EXN2; 

} 

/*  sun. awt . image . Gif ImageDecoder  */ 

sun. awt . image . Gif ImageDecoder . parselmage (THIS ,  X,  Y,  W, 

H,  BOOL,  FLAGS,  HEADER,  OUTPUT,  CM)  { 

N:  INPUT  =  makeByteArray () ; 

OFF  =  choose; 

LEN  =  choose; 

RESULT  = 

sun. awt . image . Gif ImageDecoder . readBytes (THIS ,  INPUT,  OFF, 
LEN)  ; 

DATA  =  choose; 

OUTPUT  java. lang. Ob ject#intarrayelement  :=  DATA; 

goto  N,  EX; 

EX:  return  =  choose; 

} 


sun. awt . image . ImageRepresentation. setlntPixels (THIS ,  X, 
Y,  W,  H,  CM,  INTS,  OFF,  LEN)  { 
convertPixel ( CM,  INTS ) ; 

} 


sun. awt . image . ImageRepresentation. finish (THIS ,  BOOL)  { 

} 

sun. awt . image . ImageRepresentation. imageDraw (THIS ,  G,  X, 
Y,  COLOR)  { 

} 

sun. awt . image . ImageRepresentation. imageStretch (THIS ,  G, 

X,  Y,  W,  H,  FROMX,  FROMY,  FROMW,  FROMH,  COLOR)  { 

} 

/*  sun. awt . image . Of fScreenlmageSource  */ 

sun. awt . image . Of fScreenlmageSource . sendPixels (THIS )  { 
CONSUMER  =  THIS 

sun. awt . image . Of fScreenlmageSource . theConsumer ; 

L:  X  =  choose; 

Y  =  choose; 

W  =  choose; 

H  =  choose; 

CM  =  sun. awt .windows .WToolkit .makeColorModel () ; 
BYTES  =  makeByteArray ( ) ; 

OFF  =  choose; 

LEN  =  choose; 

j  ava. awt . image . Image Consumer . set Pixels (CONSUMER,  X, 

Y,  W,  H,  CM,  BYTES,  OFF,  LEN) 

" ( IIIILj  ava. awt . image . ColorModel; [BII ) V" ; 
goto  L,  EX; 

EX:  choose; 

} 


/*  sun. awt . image . JPEGImageDecoder  */ 

sun. awt . image . JPEGImageDecoder . readlmage (THIS ,  STREAM, 
BYTES)  { 

N:  INPUT  =  makeByteArray () ; 

OFF  =  choose; 

LEN  =  choose; 

BYTE  =  java. io. InputStream. read (STREAM,  BYTES,  OFF, 
LEN)  ; 

EXN1  =  catch  (j ava. lang. Throwable)  BYTE; 

DATA  =  choose; 

BYTES  j ava. lang. Obj ect#intarrayelement  :=  DATA; 
goto  N,  EX; 
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Appendix  C:  Ajax  Reflection  Specifications 

Here  I  provide  the  complete  text  of  the  reflection  specifications  used  by  Ajax.  They  cover 
the  examples  I  used  for  this  thesis. 


j ava. lang. Class . newlnstance  [ 

aj ax. analyzer . test . ReflectionTest .main  { 

class=aj  ax. analyzer . test . ReflectionTest 

} 

sun . io . CharToByteConverter . getDef ault  { 
class=sun. io. CharToByteCpl252 

#  sun . io . CharToByte* 

} 

sun . io . ByteToCharConverter . getDef ault  { 
class=sun. io. ByteToCharCpl252 

#  sun. io. ByteToChar* 

} 

sun . io . ByteToCharConverter . getConverter  { 
class=sun. io. ByteToCharCpl252 

#  sun. io. ByteToChar* 

} 

j  ava . net . URL . getURLStreamHandler  { 
class=* .Handler 

} 

j ava. net . InetAddress . <clinit>  { 

class=j  ava.net . * InetAddress Impl 

} 

j ava. security . Security . get Impl  { 
class=sun. security . provider . * 

} 

j  ava. security. Provider . loadProvider  { 
class=sun. security . provider . Sun 

} 

j  ava. util . ResourceBundle . findBundle  { 

class=j  ava. text . resources . DateFormatZoneData 

#  j  ava. text . resources . DateFormatZoneData* 
class=j  ava. text . resources . DateFormatZoneData_en 
class=j  ava. text . resources . LocaleElements 

#  j  ava. text . resources . LocaleElements* 
class=j  ava. text . resources . LocaleElements_en 

} 

sun. security . x5 09 . Algorithmld. buildAlgorithmld  { 

class=sun. security. *<sun. security . x5 09 .Algorithmld 

} 

sun. security . x5 09 . X509Key . buildX509Key  { 
class=sun. security . x5 09 . X509Key 

} 

j  ava. awt . Toolkit . getDef aultToolkit  { 
class=sun. awt .windows . WToolkit 

} 

ladybug. engine . FormulaSolver . createSolver  { 
class=ladybug. selenum. createSolver 

} 

sun. awt . SunToolkit . <init>  { 
class=j  ava. awt . Event Queue 

} 

sun. awt .windows . WFontPeer . get Font Char set  { 
class=sun. io. CharToByteCpl252 

} 

sun. awt .windows . WFontMetrics . getMFStringWidth  { 
class=sun. io. CharToByteCpl252 

} 

sun . awt . windows . WGraphics . drawMFChars  { 
class=sun. io. CharToByteCpl252 

} 

ladybug. parse . Formula. createPeer  { 

} 

ladybug. parse . Term. createPeer  { 

} 

j ess .Main. main  { 

class=j  ess . StringFunctions 

class=j  ess . PredFunctions 

class=j  ess .MultiFunctions 

class=j  ess .MiscFunctions 

class=j  ess .MathFunctions 

class=j  ess . BagFunctions 

class=j  ess . reflect . ReflectFunctions 

class=j  ess . view. ViewFunctions 

} 

j ess . Funcall . loadlntrinsics  { 
class=j  ess .Assert 
class=j  ess . Retract 


class=j  ess . RetractString 
class=j  ess . Printout 
class=j  ess . ExtractGlobal 
class=j  ess . Open 
class=j  ess . Close 
class=j  ess . For each 
class=j  ess . Read 
class=j  ess . Readline 
class=j  ess . GensymStar 
class=j  ess .While 
class=j  ess . If 
class=j  ess . Bind 
class=j  ess .Modify 
class=j  ess .And 
class=j  ess . Not 
class=j  ess . Or 
class=j  ess . Eg 
class=j  ess . EgStar 
class=j  ess . Eguals 
class=j  ess . NotEguals 
class=j  ess . Gt 
class=j  ess . Lt 
class=j  ess . GtOrEg 
class=j  ess . LtOrEg 
class=jess.Neg 
class=j  ess .Mod 
class=j  ess . Plus 
class=j  ess . Times 
class=j  ess .Minus 
class=j  ess . Divide 
class=j  ess . SymCat 
class=j  ess . LoadFacts 
class=j  ess . SaveFacts 
class=j  ess . AssertString 
class=j  ess .UnDefrule 
class=j  ess . Try 

} 

j ess . LoadPkg. call  { 

class=jess.*<jess .Userpackage 

} 

j ess . LoadFn. call  { 

class=jess.*<jess .User function 

} 

j ess . SetStrategy. call  { 

class=jess.*<jess. Strategy 

} 

"jess. NodeTest . addTest (int, int, int, jess .Value) "  { 
class=jess.*<jess.Test 

} 

j ess . Rete . <init>  { 
class=j  ess . depth 


j ava. lang. Class . forName  [ 

aj  ax. analyzer . test . ReflectionTest .main  { 

class=aj  ax. analyzer . test . ReflectionTest 

} 

aj  ax. tools . benchmarks . GeneralBenchmark.makePrintSinkStrea 
m  { 

class=j  ava. io. OutputStream 
class=j  ava. io. PrintStream 

} 

sun. io. CharToByteConverter . getConverterClass  { 
class=sun. io. CharToByteCpl252  # 

sun. io. CharToByte* 

} 

sun. io. ByteToCharConverter . getConverterClass  { 
class=sun. io. ByteToCharCpl252  # 

sun . io . ByteToChar* 

} 

j  ava. io. Obj  ectStreamClass . <clinit>  { 
class=j  ava. io. Serializable 
class=j  ava. io. Externalizable 

} 
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j  ava . net . URL . getURLStreamHandler  { 
class=* .Handler 


class=jess.*<jess.Test 


j ava. net . InetAddress . <clinit>  { 

class=j  ava.net . * InetAddress Impl 

} 

j ava. security . Security . get Impl  { 
class=sun. security . provider . * 

} 

j  ava. security. Provider . loadProvider  { 
class=sun. security . provider . Sun 

} 

sun. security . x5 09 . Algorithmld. buildAlgorithmld  { 

class=sun. security. *<sun. security . x5 09 .Algorithmld 

} 

sun. security . x5 09 . X509Key . buildX509Key  { 
class=sun. security . x5 09 . X509Key 

} 

j  ava. awt . Toolkit . getDefaultToolkit  { 
class=sun. awt .windows . WToolkit 

} 

ladybug. engine . SchemaSolver . solverClasses  { 
class=ladybug. selenum. SelEnumSolver 

} 

sun. awt . SunToolkit . <init>  { 
class=j  ava. awt . Event Queue 

} 

sun. awt .windows . WFontPeer . get Font Char set  { 
class=sun. io. CharToByteCpl252 

} 

j  avafig. obj  ects . FigAttribs . <clinit>  { 
class=j  ava. awt . geom. Af fineTransform 

} 

j ess .Main. main  { 

class=j  ess . StringFunctions 

class=j  ess . PredFunctions 

class=j  ess .MultiFunctions 

class=j  ess .MiscFunctions 

class=j  ess .MathFunctions 

class=j  ess . BagFunctions 

class=j  ess . reflect . ReflectFunctions 

class=j  ess . view. ViewFunctions 

} 

j ess . Funcall . loadlntrinsics  { 
class=j  ess .Assert 
class=j  ess . Retract 
class=j  ess . RetractString 
class=j  ess . Printout 
class=j  ess . ExtractGlobal 
class=j  ess . Open 
class=j  ess . Close 
class=j  ess . For each 
class=j  ess . Read 
class=j  ess . Readline 
class=j  ess . GensymStar 
class=j  ess .While 
class=j  ess . If 
class=j  ess . Bind 
class=j  ess .Modify 
class=j  ess .And 
class=j  ess . Not 
class=j  ess . Or 
class=j  ess . Eg 
class=j  ess . EgStar 
class=j  ess . Eguals 
class=j  ess . NotEguals 
class=j  ess . Gt 
class=j  ess . Lt 
class=j  ess . GtOrEg 
class=j  ess . LtOrEg 
class=jess.Neg 
class=j  ess .Mod 
class=j  ess . Plus 
class=j  ess . Times 
class=j  ess .Minus 
class=j  ess . Divide 
class=j  ess . SymCat 
class=j  ess . LoadFacts 
class=j  ess . SaveFacts 
class=j  ess . AssertString 
class=j  ess .UnDefrule 
class=j  ess . Try 

} 

j ess . LoadPkg. call  { 

class=jess.*<jess .Userpackage 

} 

j ess . LoadFn. call  { 

class=jess.*<jess .User function 

} 

j ess . SetStrategy. call  { 

class=jess.*<jess. Strategy 

} 

"jess. NodeTest . addTest (int, int , int , jess .Value) "  { 


} 

j ess . Rete . <init>  { 
class=j  ess . depth 

} 

] 

j ava. lang. Class . getConstructor  [ 

j  avafig . gui . ModularEditor . handleCommandCallback  { 

} 

aj  ax. tools . benchmarks . GeneralBenchmark.makePrintSinkStrea 

m  { 


j  ava. lang. reflect . Constructor . newlnstance  [ 

j  avafig . gui . ModularEditor . handleCommandCallback  { 
class=j  avafig. commands . * 

} 

aj  ax. tools . benchmarks . GeneralBenchmark.makePrintSinkStrea 
m  { 

class=j  ava. io. PrintStream 


j ava. lang. Class . getMethod  [ 

aj  ax. analyzer . test . ReflectionTest .main  { 

method=aj  ax. analyzer . test . ReflectionTest . * 

} 

aj  ax. analyzer . test . ReflectionTest . hello  { 

method=aj  ax. analyzer . test . ReflectionTest . * 

} 

j avafig. gui .ModularEditor . call  { 

method= j  avafig . gui . ModularEditor . doCancel 
method= j  avafig . gui . ModularEditor . doUndo 
method= j  avafig . gui . ModularEditor . doRedo 
method=j  avafig. gui .ModularEditor . doFlushUndoStack 
method= j  avafig . gui . ModularEditor . doDeleteAll 

method= j  avafig . gui . ModularEditor . doCopyToClipboard 

method= j  avafig . gui . ModularEditor . doCutToClipboard 

method=j  avafig. gui .ModularEditor . doPasteFromClipboard 

method= j  avafig . gui . ModularEditor . doCreateCircle 
method=j  avafig. gui .ModularEditor . doCreateEllipse 

method= j  avafig . gui . ModularEditor . doCreateRectangle 

method= j  avafig . gui . ModularEditor . doCreateRoundRectangle 

method= j  avafig . gui . ModularEditor . doCreatePolyline 
method= j  avafig . gui . ModularEditor . doCreatePolygon 
method= j  avafig . gui . ModularEditor . doCreateSpline 

method=j  avafig. gui .ModularEditor . doCreateClosedSpline 

method=j  avafig. gui .ModularEditor . doCreateBezier 

method=j  avafig. gui .ModularEditor . doCreateClosedBezier 
method= j  avafig . gui . ModularEditor . doCreateArc 
method=j  avafig. gui .ModularEditor . doCreatelmage 
method= j  avafig . gui . ModularEditor . doCreateText 
method= j  avafig . gui . ModularEditor . doCreateLink 
method= j  avafig . gui . ModularEditor . doCreateCompound 
method= j  avafig . gui . ModularEditor . doBreakCompound 
method= j  avafig . gui . ModularEditor . doMoveOb j  ect 
method= j  avafig . gui . ModularEditor . doCopyOb j  ect 
method= j  avafig . gui . ModularEditor . doDeleteOb j  ect 
method= j  avafig . gui . ModularEditor . doMovePoint 
method=j  avafig. gui .ModularEditor . doInsertPoint 
method= j  avafig . gui . ModularEditor . doCutPoint 
method= j  avafig . gui . ModularEditor . doMirrorXOb j  ect 
method= j  avafig . gui . ModularEditor . doMirrorYOb j  ect 
method=j  avafig. gui .ModularEditor . doScaleObj  ect 
method= j  avafig . gui . ModularEditor . doAlignOb j  ects 

method= j  avafig . gui . ModularEditor . doSnapOb j  ectToGrid 

method= j  avafig . gui . ModularEditor . doConvertOb j  ect 
method=j  avafig. gui .ModularEditor . doResizeText 
method= j  avafig . gui . ModularEditor . doUpdate 
method= j  avafig . gui . ModularEditor . doCancelUpdate 
method= j  avafig . gui . ModularEditor . enableUpdateAll 
method= j  avafig . gui . ModularEditor . enableUpdateNone 

method=j  avafig. gui .ModularEditor . enableUpdatelnvert 

method= j  avafig . gui . ModularEditor . doEditOb j  ect 

method= j  avafig . gui . ModularEditor . doEditGlobalAttributes 
method=j  avafig. gui .ModularEditor . doZoomRegion 
method=j  avafig. gui .ModularEditor . doZoomln 
method=j  avafig. gui .ModularEditor . doZoomOut 
method=j  avafig. gui .ModularEditor . doZoomll 
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method= j  avaf ig . gui . ModularEditor . doPanHome 
method=j  avafig. gui .ModularEditor . doPanLeft 
method= j  avafig . gui . ModularEditor . doPanRight 
method= j avafig . gui . ModularEditor . doPanUp 
method= j  avafig . gui . ModularEditor . doPanDown 
method=j  avafig. gui .ModularEditor . doSetGridNone 
method=j  avafig. gui .ModularEditor . doSetGridCoarse 
method=j  avafig. gui .ModularEditor . doSetGridMedium 
method=j  avafig. gui .ModularEditor . doSetGridFine 
method=j  avafig. gui .ModularEditor . doSetNoSnap 
method=j  avafig. gui .ModularEditor . doSetSnapl2 
method=j  avafig. gui .ModularEditor . doSetSnapl4 
method=j  avafig. gui .ModularEditor . doSetSnapl8 
method=j  avafig. gui .ModularEditor . doSetUnits Inches 

method= j avafig . gui . ModularEditor . doSetUnitsMillimeter 

method= j  avafig . gui . ModularEditor . doSetUnitsXf igMillimeter 

method=j  avafig. gui .ModularEditor . doSnapAHObj  ectsToGrid 

method=j  avafig. gui .ModularEditor . doClearUserColors 

method=j  avafig. gui .ModularEditor . doWriteHadesResource 
method= j  avafig . gui . ModularEditor . doRedraw 

method= j  avafig . gui . ModularEditor . doStartNewDrawing 

method=j  avafig. gui .ModularEditor . doSelectFile 
method=j  avafig. gui .ModularEditor . doMergeFile 
method=j  avafig. gui .ModularEditor . doSelectURL 
method= j avafig . gui . ModularEditor . doMergeURL 

method=j  avafig. gui .ModularEditor . handleParserCallback 

method=j  avafig. gui .ModularEditor . handleParserMergeCallbac 
k 

method= j  avafig . gui . ModularEditor . handle CommandCallback 
method= j  avafig . gui . ModularEditor . doQuit 
method=j  avafig. gui .ModularEditor . doSaveFile 
method=j  avafig. gui .ModularEditor . doSaveFileAs 
method=j  avafig. gui .ModularEditor . doSaveToConsole 
method= j  avafig . gui . ModularEditor . doPrintViaAWT 
method= j  avafig . gui . ModularEditor . doPrintUndoStack 
method= j avafig . gui . ModularEditor . doPrintClipboard 
method= j  avafig . gui . ModularEditor . doPrintOb j  ects 
method=j  avafig. gui .ModularEditor . doShowMessages 

method= j  avafig . gui . ModularEditor . doShowAboutDialog 

method=j  avafig. gui .ModularEditor . doShowLicenseDialog 

method= j avafig . gui . ModularEditor . doShowDeadlockDialog 

method= j  avafig . gui . ModularEditor . doShowChangesDialog 

method=j  avafig. gui .ModularEditor . doShowMouseButtonDialog 

method= j  avafig . gui . ModularEditor . doShowShortcutKeysDialog 
method= j  avafig . gui . ModularEditor . doShowF agDialog 
method= j  avafig . gui . ModularEditor . doShowHelpDialog 
method= j avafig . gui . ModularEditor . doShowDemoGold 
method=j  avafig. gui .ModularEditor . doShowDemoHouse 
method= j avafig . gui . ModularEditor . doShowDemoWatch 

method= j  avafig . gui . ModularEditor . doShowDemoCircuit 

method= j avafig . gui . ModularEditor . doShowDemoLayout 

method= j  avafig . gui . ModularEditor . doShowDemoPictures 

method= j avafig . gui . ModularEditor . doShowDemoRotated 

method= j  avafig . gui . ModularEditor . doShowDemoUnicode 

method= j  avafig . gui . ModularEditor . doShowDemoWelcome 

} 

j  avafig. gui . EditTextDialog. getStatusMessage  { 
method=j  avafig. * . getStatusMessage 

} 

j  avafig. gui . EditPolylineDialog. getStatusMessage  { 
method=j  avafig. * . getStatusMessage 

} 

j  avafig. gui . EditEllipseDialog. getStatusMessage  { 
method=j  avafig. * . getStatusMessage 

} 

j  avafig. gui . EditTriggerDialog. getStatusMessage  { 
method=j  avafig. * . getStatusMessage 

} 

j  avafig. gui . EditlmageDialog. getStatusMessage  { 
method=j  avafig. * . getStatusMessage 

} 

j  avafig. gui . EditRectangleDialog. getStatusMessage  { 


method=j  avafig. * . getStatusMessage 

} 

j  avafig. gui . EditGlobalAttributesDialog. getStatusMessage  { 
method=j  avafig. * . getStatusMessage 

} 

j  avafig. commands . ZoomReg ion Command. execute  { 
method=j  avafig. * . doZoomRegion 


j  ava. lang. reflect .Method. invoke  [ 

aj ax. analyzer . test . ReflectionTest .main  { 

method=aj  ax. analyzer . test . ReflectionTest . * 

} 

aj  ax. analyzer . test . ReflectionTest . hello  { 

method=aj  ax. analyzer . test . ReflectionTest . * 

} 

j avafig. gui .ModularEditor . call  { 

method= j  avafig . gui . ModularEditor . doCancel 
method= j  avafig . gui . ModularEditor . doUndo 
method= j  avafig . gui . ModularEditor . doRedo 
method=j  avafig. gui .ModularEditor . doFlushUndoStack 
method= j  avafig . gui . ModularEditor . doDeleteAll 

method= j  avafig . gui . ModularEditor . doCopyToClipboard 

method= j  avafig . gui . ModularEditor . doCutToClipboard 

method=j  avafig. gui .ModularEditor . doPasteFromClipboard 

method= j  avafig . gui . ModularEditor . doCreateCircle 
method=j  avafig. gui .ModularEditor . doCreateEllipse 

method= j  avafig . gui . ModularEditor . doCreateRectangle 

method= j  avafig . gui . ModularEditor . doCreateRoundRectangle 

method= j  avafig . gui . ModularEditor . doCreatePolyline 
method= j  avafig . gui . ModularEditor . doCreatePolygon 
method= j  avafig . gui . ModularEditor . doCreateSpline 

method=j  avafig. gui .ModularEditor . doCreateClosedSpline 

method=j  avafig. gui .ModularEditor . doCreateBezier 

method=j  avafig. gui .ModularEditor . doCreateClosedBezier 
method= j  avafig . gui . ModularEditor . doCreateArc 
method=j  avafig. gui .ModularEditor . doCreatelmage 
method= j  avafig . gui . ModularEditor . doCreateText 
method= j  avafig . gui . ModularEditor . doCreateLink 
method= j  avafig . gui . ModularEditor . doCreateCompound 
method= j  avafig . gui . ModularEditor . doBreakCompound 
method= j  avafig . gui . ModularEditor . doMoveOb j  ect 
method= j  avafig . gui . ModularEditor . doCopyOb j  ect 
method= j  avafig . gui . ModularEditor . doDeleteOb j ect 
method= j  avafig . gui . ModularEditor . doMovePoint 
method=j  avafig. gui .ModularEditor . doInsertPoint 
method= j  avafig . gui . ModularEditor . doCutPoint 
method= j  avafig . gui . ModularEditor . doMirrorXOb j ect 
method= j  avafig . gui . ModularEditor . doMirrorYOb j  ect 
method=j  avafig. gui .ModularEditor . doScaleObj  ect 
method= j  avafig . gui . ModularEditor . doAlignOb j  ects 

method= j  avafig . gui . ModularEditor . doSnapOb j  ectToGrid 

method= j  avafig . gui . ModularEditor . doConvertOb j  ect 
method=j  avafig. gui .ModularEditor . doResizeText 
method= j  avafig . gui . ModularEditor . doUpdate 
method= j  avafig . gui . ModularEditor . doCancelUpdate 
method= j  avafig . gui . ModularEditor . enableUpdateAll 
method= j  avafig . gui . ModularEditor . enableUpdateNone 

method=j  avafig. gui .ModularEditor . enableUpdatelnvert 

method= j  avafig . gui . ModularEditor . doEditOb j  ect 

method= j  avafig . gui . ModularEditor . doEditGlobalAttributes 
method=j  avafig. gui .ModularEditor . doZoomRegion 
method=j  avafig. gui .ModularEditor . doZoomln 
method=j  avafig. gui .ModularEditor . doZoomOut 
method=j  avafig. gui .ModularEditor . doZoomll 
method= j  avafig . gui . ModularEditor . doPanHome 
method=j  avafig. gui .ModularEditor . doPanLeft 
method= j  avafig . gui . ModularEditor . doPanRight 
method= j  avafig . gui . ModularEditor . doPanUp 
method= j  avafig . gui . ModularEditor . doPanDown 
method=j  avafig. gui .ModularEditor . doSetGridNone 
method=j  avafig. gui .ModularEditor . doSetGridCoarse 
method=j  avafig. gui .ModularEditor . doSetGridMedium 
method=j  avafig. gui .ModularEditor . doSetGridFine 
method=j  avafig. gui .ModularEditor . doSetNoSnap 
method=j  avafig. gui .ModularEditor . doSetSnapl2 
method=j  avafig. gui .ModularEditor . doSetSnapl4 
method=j  avafig. gui .ModularEditor . doSetSnapl8 
method=j  avafig. gui .ModularEditor . doSetUnits Inches 

method=j  avafig. gui .ModularEditor . doSetUnitsMillimeter 
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method=j  avafig. gui .ModularEditor . doSetUnitsXfigMillimeter 

method=j  avafig. gui .ModularEditor . doSnapAHObj  ectsToGrid 

method=j  avafig. gui .ModularEditor . doClearUserColors 

method=j  avafig. gui .ModularEditor . doWriteHadesResource 
method= j  avafig . gui . ModularEditor . doRedraw 

method= j  avafig . gui . ModularEditor . doStartNewDrawing 

method=j  avafig. gui .ModularEditor . doSelectFile 
method=j  avafig. gui .ModularEditor . doMergeFile 
method=j  avafig. gui .ModularEditor . doSelectURL 
method= j  avafig . gui . ModularEditor . doMergeURL 

method=j  avafig. gui .ModularEditor . handleParserCallback 

method=j  avafig. gui .ModularEditor . handleParserMergeCallbac 
k 

method= j  avafig . gui . ModularEditor . handle CommandCallback 
method= j  avafig . gui . ModularEditor . doQuit 
method=j  avafig. gui .ModularEditor . doSaveFile 
method=j  avafig. gui .ModularEditor . doSaveFileAs 
method=j  avafig. gui .ModularEditor . doSaveToConsole 
method= j  avafig . gui . ModularEditor . doPrintViaAWT 
method= j  avafig . gui . ModularEditor . doPrintUndoStack 
method= j  avafig . gui . ModularEditor . doPrintClipboard 
method= j  avafig . gui . ModularEditor . doPrintOb j  ects 
method=j  avafig. gui .ModularEditor . doShowMessages 

method= j  avafig . gui . ModularEditor . doShowAboutDialog 

method=j  avafig. gui .ModularEditor . doShowLicenseDialog 

method= j avafig . gui . ModularEditor . doShowDeadlockDialog 

method= j  avafig . gui . ModularEditor . doShowChangesDialog 

method=j  avafig. gui .ModularEditor . doShowMouseButtonDialog 

method= j  avafig . gui . ModularEditor . doShowShortcutKeysDialog 
method= j avafig . gui . ModularEditor . doShowF  agDialog 
method= j  avafig . gui . ModularEditor . doShowHelpDialog 
method= j  avafig . gui . ModularEditor . doShowDemoGold 
method=j  avafig. gui .ModularEditor . doShowDemoHouse 
method= j avafig . gui . ModularEditor . doShowDemoWatch 

method= j  avafig . gui . ModularEditor . doShowDemoCircuit 

method= j  avafig . gui . ModularEditor . doShowDemoLayout 

method= j  avafig . gui . ModularEditor . doShowDemoPictures 

method= j  avafig . gui . ModularEditor . doShowDemoRotated 

method= j  avafig . gui . ModularEditor . doShowDemoUnicode 

method= j  avafig . gui . ModularEditor . doShowDemoWelcome 

} 

j  avafig. gui . EditTextDialog. getStatusMessage  { 
method=j  avafig. * . getStatusMessage 

} 

j  avafig. gui . EditPolylineDialog. getStatusMessage  { 
method=j  avafig. * . getStatusMessage 

} 

j  avafig. gui . EditEllipseDialog. getStatusMessage  { 
method=j  avafig. * . getStatusMessage 

} 

j  avafig. gui . EditTriggerDialog. getStatusMessage  { 
method=j  avafig. * . getStatusMessage 

} 

j  avafig. gui . EditlmageDialog. getStatusMessage  { 
method=j  avafig. * . getStatusMessage 

} 

j  avafig. gui . EditRectangleDialog. getStatusMessage  { 
method=j  avafig. * . getStatusMessage 


j  avafig. gui . EditGlobalAttributesDialog. getStatusMessage  { 
method=j  avafig. * . getStatusMessage 

} 

j  avafig. commands . ZoomReg ion Command. execute  { 
method=j  avafig. * . doZoomRegion 


"java. lang. ClassLoader . defineClass (java. lang. String, byte [ 
] , int , int ) "  [ 

] 

j  ava. lang. ClassLoader . findSystemClass  [ 


j  ava. util . SystemClassLoader . loadClass  { 

} 

] 

j  ava. util . SystemClassLoader . loadClass  [ 

] 

j  ava. io. Obj  ectlnputStream. <init>  [ 

aj ax. jbc . util . salamis . SalamisCodeLoader . readCode  { 
serialized=aj  ax. jbc . util . salamis . * 
serialized=aj  ax. jbc . util . * 
serialized=j  ava. util . Hashtable 

} 

sun. security . provider . IdentityDatabase . fromStream  { 
serialized=sun. security. * 
serialized=j  ava. security. * 


sun. awt .windows . WFontPeer . get Font Char set  [ 

] 
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